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ABSTRACT ` 
Network on chip is a scalable md flexible communi canon 
architecture for the design of core based System-on-Chip. 
Communication performanck' of a NOC heavily depends on 
touting algorithm. XY routing algorithm’ is distributed 
deterministic ‘routing algorithm: Odd-Even (OE) © routing 
algorithm -is distributed ‘adaptive routing algorithm ‘with 
deadlock-free ability. DyAD combines the advantages of both 
-deterministic and adaptive routing schemes. Key metrics which 
determines best performance for routing algorithms for 
‘Network-on-Chip architectures are Minimum: - Latency, 


Minimum Power and Maximum Throughput, We demonstrated > 


‘the impact of traffic load (bandwidth) variations ‘on average 
latency and total network power for three routing algorithms 
XY, OF and DyAD on a 3x3 2-dimensional mesh topology. The 
simulation is performed on nirgam NoC ‘simulator’ version’ 2. 1 
‘for constant bit rate traffic condition. The simulation. results 
reveals the dominance of DyAD over XY and OE algorithms 
depicting’ the minimum values of overall average latency per 
channel (in clock cycles per flit) as 1.58871, overall average 
latency per channel (in clock cycles per packet)’ as 9:53226, 
overall average latency (in cloek-cycles per flit) as 26.105; and 
total network power as 0. ue LEIG, GOIENER jor ce 


routing algorithm. ” 


Keywords: E Sachin: XY ae algorithm; OE 
routing ee a routing eae 


H INTRODUCTION 
Network on Chip (NoC) isa new parddigm for System on Chip 
(SoC) design [1-5].With the growing complexity and 
increasing integration; the commonly’ used- interconnection 
techniques for SoC architecture, bus structure, poses practical 
physical problems. In NoC paradigm, cores are connected to 
‘ each other through a network of-routers and they communicate 
among themselves through packet-switched communication. 
The protocols used in NoC are generally simplified versions of 
general communication protocols used in data networks. ‘This 
makes it possible to use accepted and mature concepts of 
communication networks such as routing algorithms, switching 
techniques, flow and congestion control etc. in Network-on- 
chip architecture. It allows significant reuse of resources 'and 
provides «highly; scalable - and’ flexible communication 


infrastructure for SoC design. 
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Data communications between segments of chip are meee 
and ‘transferred through the network.. The network consists of 
wires.and routers. Processors, memories and other IP-blocks 


(Intellectual property) are connected to routers. A routing 


algorithm plays a P role on: ne ee 
N : 





t> Figure: 4X42- Dimensional Mesh NoC - 


Differerit ‘routing’ algorithms are targeted - for. different 
applications. Several routing algorithtns.need: to be investigated 
and designed with various features and’ purposes. . 

The topic of Network on chip architecture is being introduced 
in this’section. Section-2 explains about the three basic routing 


- algorithms namely XY, Odd-even and DyAD routing algorithm 


in greater details. Section —3 describe architecture of a 3x3 2- 
dimensional mesh topology. based NoC. Section -4 discusses 
simulation results and analysis of thé as work. Section-5 
ends one paon, EA 


2. XY, OE AND DYAD ROUTING ALGORITHM 
The routing algorithm, which defines the path taken by.a 
packet between the source and the destination, is a main task in 
network layer design of NoC. According to where routing 
decisions are taken, it is possible to classify the routing as 
source and distributed routing [6]. . . 

Routing algorithm can be classified on the basis of adaptivity 
such as deterministic or adaptive. In deterministic routing, the 
path from source to. destination is completely determined: in 
advance by the source and destination address. Examples are 
XY routing. In adaptive routing, multiple paths from source to 
destinations are possible [7 i ially adaptive 
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Toutiig algorithms which ‘restrict ‘certain paths for 
communication with deadlock restrictions. Examples are Odd 
even routing. They are simple and easy to implement compared 
to adaptive routing algorithm. The routing algorithm that uses 
shortest path for communication is called minimal routing. The 
routing algorithm which uses longer paths for communication 
though shorter paths exist is known as non-minimal routing. 
Non-minimal routing has some advantages over minimal 
routing including possibility of balancing network load and 
fault tolerance. In static routing, the path cannot be changed 
after a packet leaves the source. In dynamic routing, a path can 
be altered anytime depending upon the network conditions. 
Routing algorithms can also be defined based on their 
implementation: lookup table and Fintte State Machine (FSM). 
In the following text, three different routing algorithms are 
described in details: 


2.1 XY ROUTING 

The XY -routing- algorithm is one kind of distributed 
deterministic routing algorithm. XY routing never runs into 
deadlock or livelock [8]. For a 2-Dimesion mesh topology 
' No, each router can be identified by its coordinate (x, y) (Fig. 
2). The XY routing algorithm compares the current router 
address (Cx,Cy) to the destination router address (Dx,Dy) of 
the packet; stored in the header flit [9]. Flits must be routed to 
the core port of the router when the (Cx,Cy) address of the 
current router is equal to the (Dx,Dy) address. 

If this is not the case, the Dx address is firstly compared to the 
Cx (horizontal) address. Flits will be routed to the East port 
when Cx<Dx, to West when Cx>Dx and if Cx=Dx the header 
flit is already horizontally aligned. If this last condition is true, 
the Dy (vertical) address is compared to the Cy address. Flits 
: will be routed to South when Cy<Dy, to North when Cy>Dy. If 
the chosen port is busy, the header flit as well as all subsequent 
flits of this packet will be blocked. The routing request for this 
packet will remain active until a connection is established: in 
some future execution of the procedure in this router. 

The following text is the XY routing algorithm: 

/* XY routing Algorithm */ 

/*Source router: (Sx,Sy);destination router: (Dx,Dy); current 

router: (CxCy). a 


if (Dx>Cx) //eastbound messages 

return E; 

else 

if (Dx<Cx) //westbound messages 

return W; 

else | 

if (Dx=Cx) { Heurrently in the same column as 
//destination 

if (Dy<Cy) //southbound messages: 

retum S; >‘ 
else 

if (Dy>Cy) northbound messages 

return N; 

else i 
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if (Dy~Cy) //current router is the destination router | 

return C; 

} 

end 

The implemention of XY routing algorithm is simple. 
However, it is deterministic routing algorithm, which means 
this routing algorithm only provides a routing path for a pair of 
source and destination. Moreover, XY routing algorithm cannot 
avoid from deadlock appearance. 


2.2 ODD-EVEN ROUTING (OE) 

OE routing algorithm, is a distributed adaptive routing 
algorithm which is based on odd-even turn model [10]. It exerts 
some restrictions, for avoiding and preventing from deadlock 
appearance. Odd-even turn model facilitates deadlock-free 
routing in two-dimensional (2D) meshes with no virtual 
channels. 

In a two-dimension mesh with dimensions X*Y each node is 
identified by its coordinate (x, y) [9]. In this model, a column is 
called even if its x dimension element is even numerical 
column. Also, a column is called odd if its x dimension element 
is an odd number. A turn involves a 90-degree change of 
traveling direction. There are eight types of turns, according to 
the traveling directions of the associated channels. A turn is 
called an ES turn if it involves a change of direction from East 
to South. Similarly, we can define the other seven types of 
turns, namely EN, WS, WN, SE, SW, NE, and NW turns, 
where E, W, S, and N indicate East, West, South, and North, 
respectively. As a whole, there are two main theorems in odd- 
even algorithm: 

Theorem!: No packet is permitted to do EN turn in each node 
which is located on an even column. Also, No packet is 
permitted to do NW turn in each node that is located on an odd 
column. 

Theorem 2: No packer le perm isd todo. BS himii each node 
that is in an even column. Also, no packet is permitted to do 
SW turn in each node which is in an odd column. 

The following test is a minimal OE routing algorithm in which 
avail dimension_set contains dimensions that are available for 
forwarding the packet: 

/* OE routing algorithm */ 

/*Source router: (Sx,Sy);destination router: S eene 

router: (Cx,Cy).*/ 


: begin. 


avail: dimension_set<-empty; 

Ex<-Dx-Cx; 

Ey<-Dy-Cy, 

if (Ex—0 && Ey=0) //current router is destination 

return C; 

if (Ex=0){ //current router in same column as destination 
if (Ey<0) s 

add S to avail_dimension_set; 

else 

add N to avail dimension _set; 


else{ 


oy 
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if (Ex>0){ //eastbound messages 

if (Ey™0){ //current in same row as destination 
add E to avail demision_set; 

} 

else { 

if(Cx % 2 l= 0 or Cx=Sx) //N/S tum allowed only in odd 
column, 

if{Ey < 0) | 

add S to avail_dimension_set; 

else 

add N to avail dimension_set; 

if(Dx% 2 != 0 or Ex |= 1) { 

/lallow to go E only if destination is odd column 
add E to avail dimension_set; 

/foecause N/S turn not allowed in even column 

} 

} 


} 

else { // westbound messages 

add W to avall_dimension_set; 

if(Cx%2=0) //allow to go N/S only in even column, because N- 
>W and S->W 

/not allowed in odd column 

if{EyD) 

add S to avail_dimension_ set; 

else 

add N to avail dimension_set; 

) 

//Select a dimension from avail dimension set to forward the 
/fpacket. 

End 

OE routing algorithm is more complex than XY routing 
algorithm. However, it is one kind of adaptive routing 
algorithm. For a pair of source and destination, it can provide a 
group of routmg paths and it can prevent from dead lock 
appearance. 


2.3 DYAD ROUTING 

DyAD combines the advantages of both deterministic and 
adaptive routing schemes [11]. DyAD is a routing technique 
which judiciously switches between deterministic and adaptive 
routing based on network congestion’s conditions. Compared 
to purely adaptive routers, the overhead of implementing 
DyAD is negligible, while the performance is consistently 
better. 

With DyAD routing each router in the network continuously 
monitors its local network load and makes decisions based on 
this information. When the network is not congested, a DyAD 
router works in a deterministic mode, thus enjoying the low 
routing latency enabled by deterministic routing. On the 
contrary, when the network becomes congested, the DyAD 
router switches back to the adaptive routing mode and thus 
avoids the congested links by exploiting other routing paths; 
this leads to higher network throughput which is highly 
desirable for applications. 
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The freedom from deadlock and livelock [8] can be guaranteed 
when mixing deterministic and adaptive routing modes into the 
same Noc. 


3. ARCHITECTURE OF 2 DIMENSION 3X3 MESH 
TOPOLOGY NOC 

The routing Algorithm is simulated based on a 2-Dimension 
3X3 mesh topology NoC (Fig. 2). In the Fig. 2, each circle 
represents a tile in the network. Each tile consists of an IP core 
connected to a router by a bidirectional core channel (C). A tile 
is connected to neighbor tiles by four bidirectional channels (N, 
E, S and W). Each tile is identified by a unique integer ID. 
Also, each tile can be identified by a pair x-coordinate and y- 
coordinate. Our 2-Dimesion 3X3 mesh topology NoC is 
designed using wormhole switching mechanism, in which 
packets are divided into flits. A packet consists of 3 types of 
flits, which are head flit, data flit and tail flit. All the three 
routing algorithms; XY routing algorithm, OE routing 
algorithm and DyAD routing algorithms are based on these 





Figure2: Architecture of a 2-dimensional 3x3 mesh topology 
based NoC 


4. SIMULATION RESULTS AND ANALYSIS 

The simulation is performed on NIRGAM simulator, a 
simulator for NoC Interconnect Routing and Application 
Modeling version 2.1. NIRGAM is an extensible and modular 
systemC based simulator [12] as has been depicted in Fig. 3. 
Simulations to all the three routing algorithms are performed 
under same traffic conditions and simulation control. Tiles are 
attached to constant bit rate (CBR) traffic generator. The packet 
size is of 20 bytes with random destination mode. The 
percentage load, maximum bandwidth to be utilized, is varied 
beginning with 10 % to 100 % in the steps of 10 %. The 
interval between two successive flits is 2 clock cycles. 
Simulation runs for 50000 clock cycles and the clock 
frequency is 1 GHz. Synthetic traffic generators generate traffic 
in the first 3000 clock cycles with warm-up period of 800 clock 
cycles. 

Fig. 3 shows the utilization of simulator for the proposed work 
elaborating the inputs given to the simulator and outputs taken 
from the simulator. There are two bets measures of 
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Performance of routing algorithms namely, overall average 
latency & total network power. The overall average latency in 
clock cycles per flit is also measured on a per channel basis on 
clock cycles per flit and clock cycles per packet. Total network 
power is measured in the units of millt Watts. 2 


Overall Average 
Latency 





Figure3: Inputs and Outputs to Nirgam NoC Simulator 


Table 1 depicts Simulation results for a 3x3 mesh topology 
NoC by comparing the impact of Load variation (bandwidth 
variation) on overall average latency per channel (in clock 
cycles per flit) for XY, OE and DyAd routing algorithms. 

Fig.4. shows the graphical representation for simulation data of 

' Table 1 shows Percentage Load variation vs Overall average 

latency per channel (in clock cycles per flit) for OE, XY and 

DyAD routing algorithm. 

Table 2 depicts Simulation results for a 3x3 mesh Dn 
NoC by comparing the impact of Percentage Load variation 
(bandwidth variation) on overall average latency per channel 
(in clock. cycles per packet) for XY, OE. and DyAd routing 
algorithms. 

Fig.5. shows the graphical representation for simulation data of 
Table 2 shows Percentage Load variation vs Overall average 
latency per channel (in clock cycles per packet) for OE, XY 
and DyAD routing algorithm. 

Table 3 depicts Simulation results for a 3x3 mesh topology 

.NoC by comparing the impact of Percentage Load variation 
(bandwidth . variation) on Overall average latency (in clock 
cycles per flit) for XY, OE and DyAd routing algorithms. 

Fig.6 shows the graphical representation for simulation data of 
Table 2 shows Load vs Overall average latency (in clock cycles 
per flit) for OE, XY and DyAD routing algorithm. 

Table 4 depicts Simulation results for a 3x3 mesh topology 

_NoC by comparing the impact of Percentage Load variation 
(bandwidth variation) on Total Network Power for XY, OE and 
DyAd routing algorithms. 

,Fig.7. shows the graphical representation for data of Table 4 
shows Percentage Load variation vs Total Network Power for 
OE, XY and DyAD routing algorithm, 

„d i i . 
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CONCLUSION 

The routing algorithm is one of network layer researches of a 
NoC design, whose design approach can be adapted from a< 
protocol stack including physical layer, data link layer, network 
layer and transport layer. Based on a 2-Dimension 3x3 mesh 
topology NoC, three different routing algorithms, XY routing 
algorithm, OE routing algorithm and DyAD routing algorithm 
are simulated on NIRGAM simulator platform and impact of 
Percentage Load variation is compared with four , different 
parameters namely overall average latency per channel per 
packet, overall average latency per channel per flit, overall 
average latency per flit and overall network power respectively. 
The performance evaluation and the.impact of Percentage Load 
variation (bandwidth variation) among the routing algorithms 
for two important parameters, overall average latency and ` 
overall network power are considered important design criteria 
to judge simulator as well as routing algorithm in the NoC 
research. 

The minimum value of overall average latency per channel (in 
clock cycles per flit) is obtained as 1.58871, overall average 
latency per channel (in clock cycles per packet) is obtained:as 
9.53226, overall average latency (in clock cycles per flit) is 
obtained as 26.105, and total network power is obtained- as 
0.1771 milliwatts, achieved for DyAD routing algorithm. Thus 
proposed work shows the dominance of DyAD routing 
algorithm over OE and XY routing algorithms. 

Thus it is concluded that compared to both deterministic and 
adaptive routing, significant performance improvements in 
terms of total network power as well as overall average latency 
can be achieved by using the DyAD.approach for constant bit 
rate traffic conditions. Kud 


FUTURE SCOPE. | 

Our conclusions are just fit for a Apreni 3x3 mesh 
topology NoC. For other topologies, as well ag taking into 
consideration other parameters, additional work needs to be 
done in the future. 
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Tablel: Simulation results for load variation verses overall 
average latency per channel (in clock cycles per flit) for XY, 
OE and DyAd routing algorithms 
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Table2: Simulation Data results for load variation verses 
overall average latency per channel {in clock cycles per 
packet) for XY, OE and DyAd routing algorithms. 
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FigureS: Graph of Percentage Load variation vs Overall 
average latency per channel (in clock cycles per packet) for Lead vs Total Network Power 
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Table3: Simulation results for Percentage Load variation 
verses Overall average latency (in clock cycles per flit) for XY, 
OE and DyAd routing algorithms 
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Table4: Simulation results for Percentage Load variation 
verses Total Network power for XY, OE and DyAd routing 
algorithms 
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ABSTRACT 

In this paper an attempt has been made to determine efficiency 
of semi transparent hybrid photovoltaic thermal double pass 
air collector for different PV technology and compare it with 
single pass air collector using artificial neural network (ANN) 
technique for New Delhi weather station of India. The 
MATLAB 7.1 neural networks toolbox has been used for 
defining and training of ANN for determination of thermal, 
electrical, overall thermal and overall exergy efficiency of the 
system. The ANN model uses ambient air temperature, number 
of sunshine hours, number of clear days, temperature 
coefficient, cell efficiency, global and diffuse radiation as input 
parameters. The transfer function, neural network 
configuration and learning parameters have been selected 
based on highest convergence during training and testing of 
network. About 2000 sets of data from four weather stations 
(Bangalore, Mumbai, Srinagar and Jodhpur) have been given 
as input for training and data of the fifth weather station (New 
Delhi) has been used for testing purpose. It has been observed 
that the best transfer function for a given configuration is 
logsig. The feed forward back-propagation algorithm has been 
used in this analysis. Further the results of ANN model have 
been compared with analytical values on the basis of root mean. 


square error. 


Keywords: Artificial neural network (ANN), Efficiency, 
Photovoltaic thermal (PVT), Levenberg-Marquardt (LM), 
Multi-layer perceptron (MLP), Mean Bias Error (MBE), Single 


pass (SP), Double pass (DP). 


1. , INTRODUCTION 

Due to depleting rate of conventional energy sources there have 
been sincere efforts all over the world to harness renewable 
energy, resources. Solar energy is one of the significant 
renewable energy source that can be hamessed using 
photovoltaic thermal systems. The major applications of solar 
energy can be classified as: thermal system, which converts 
solar energy into thermal energy and photovoltaic (PV) system, 
which converts solar energy. mto electrical energy. The 
integrated arrangement for utilizing thermal energy as well as 
electrical energy, with a photovoltaic module is referred to as 
the hybrid PVT system. PVT collector produces thermal and 
electrical energy simultaneously and hence it is referred as 
hybrid PVT system. The development of sustainable 
technologies requires an overall evaluation of the product’s 


tot 


environmental impacts and benefits. The solar cells currently in 
the market have undergone the environmental evaluations to be 
classified as sustainable sources of energy. Over last decade 
there is rapid increase in PV energy generation devices. The 
classification for photovoltaic technology available in the 
market has been given in Table 1. Since late 1990s, new PV 
technologies have begun to emerge commercially along with 
more traditional Si-based systems. The emerging non 
crystalline silicon technologies have started making 
momentous into solar cell markets. These thin film PV modules 
still constitute a tiny fraction of the total PV market, but things 
may change quickly as new manufacturers hit the market each 
year. (Raugei and Frank!, 2008) have compared energy cost of 
thin film PV-cells to that of crystalline systems. It has been 
observed that energy cost has dropped to 1$/W (Fthenakis, 
2009). The performance of a PV can be described in terms of 
its energy conservation efficiency and the percentage of 
incident solar ray that -converts cell into electricity under 
standard test conditions. 

Most parts of India receive abundant quantity of solar energy 
due to their geographical positions but it is difficult to have 
measurements from all locations. of interest as measuring 
devices are expensive to purchase, install and maintain. The 
design of any cost effective system, depend on the reliable data 
for which accurate techniques are required. The ANN 
methodology is a promising alternative to the traditional 
approach for estimating solar radiation. (Jiang, 2008) has 
developed a model for estimation of the monthly mean daily 
diffuse solar radiation for eight typical cities in China. It has 
been observed that ANN-based estimation technique is more 
suitable than the empirical regression models for estimation of 
solar radiation. (Leal et al., 2011) have measured, analyzed and 
compared three different statistical models and two ANN 
models for estimating the daily UV solar radiation from the 
daily global radiation. It has been observed that the statistical 
and ANN models have good statistical performance with 
RMSE lower than 5% and MBE between 0.4 - 2 %. (Koca et 
al., 2011) have developed an ANN model for estimation of 
future data on solar radiation for seven cities from 
Mediterranean region of Anatolia in Turkey. The obtained 
results indicated that the method could be used by researchers 
or scientists to design high efficiency solar devices. 

ANN’s have also been used for prediction of energy 
consumption (Ekonomou, 2010; Tso & Yau, 2007; Kalogirou 
& Bojic, 2000). (Yoro et al., 2009) have applied the ANN 
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method for exergy analysis of thermodynamic systems and 


presented the performance of the ANN method to emphasize 
the definition of ANN inputs. (Hui Xie et al., 2009) developed 
an ANN to determine the performance of solar collectors for 
Beijing with 10 neurons in the hidden layer considering back 
propagation learning algorithm and logistic sigmoid transfer 
function with minimum RMSE. The performance parameters 
ambient temperature of collector, solar intensity, declination 
angle, azimuth angle and tilt angle have been used as training 
data in the Input layer for computing efficiency and heating 
capacity outputs. It has been observed that there is fair 
agreement between experimental and ANN model for 
performance prediction of solar collectors. (Caner et al., 2011) 
have designed an ANN model considering LM based MLP in 
Matlab nntool module to estimate thermal performances of two 
types of solar air collectors. The calculated and predicted 
values of thermal performances have been compared and 
statistical error analysis has been carried out to evaluate results. 
Further reliability of ANN has been tested by applying 

stepwise regression method to the data used m designing. 
rye ola ra 2011) have compared the results of three 


classical and ANN methods for estimating the annual energy 


produced by a PV generator for Solar and Automatic Energy at 
the University of Jaen. It has been observed that ANN method 
provides better results than the alternative classical methods in 
study, as it takes some second order effects such as low 
irradiance, angular and spectral effects into consideration. 
(Ashhab, 2007) has used ANN technique for forecasting 
photovoltaic solar integrated system efficiencies. (Sozen et al., 
2008) have developed an ANN model to determine the 
efficiency of flat plate solar collectors. The collector surface 
temperature, date, time, solar radiation, declination angle, 
azimuth angle and tilt angle have been used as the input and 
efficiency of flat plate solar collector has been used as the 
output with Logistic sigmoid transfer function in the network. 
The results have shown that the maximum and minimum 
deviations were found to be 2.558484 and 0.001969 
respectively. Efficiency of solar cells has significantly 
improved over the last few decades. However realized values 
are much lower than the theoretical limits. 

In this peper efficiency of a semi transparent single and double 
pass air collector for different PV technology have been 
evaluated considering four types of weather conditions defined 
as (Singh, 2005). Table 2 shows the number of clear days in 
different weather condition for New Delhi weather station. The 
data of solar radiations for different climates for four weather 
stations Mumbai, Srinagar and Jodhpur) obtained 
from Indian Metrological Department, Pune have been used for 
training and data of the fifth weather station (New Delhi) has 
been used for testing purpose. The results of ANN models for 
semi transparent hybrid PVT single and double pass air 
collector for different PV technology have been compared with 
analytical values on the basis of RMSE. 
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2. ARTIFICIAL NEURAL NETWORK 
2.1. THEORY 
ANN models are computer programs designed to follow human 
information processing capability like knowledge processing, 
prediction and classifications. The ability of ANN to learn from 
examples provides quick responses to new information. ANN’s 
although implemented on computers but they are not 
programmed to perform specific tasks rather trained with 
respect to data sets until they learn patterns which have bedn 
used as inputs. Once they are trained, new patterns may be 
presented to networks for prediction or classification. ANN 
model has three types of layers: input, hidden and output layer. 
The neurons in layers are connected together in a network 
topology. The input neurons receive data from the external 
environment, the hidden neurons receive signals from neurons 
in the preceding layer, and the output neurons send information 
back to the external environment. The information is passed 
through neurons along with interconnections. An incoming 
connection has input value and weight associated with it and 
the output of the unit is function of the summed value. After 
summation, the net input of the neurons is combined with the 
previous state of the neurons to produce a new activation value. 
The activation is then passed through an output or a transfer 
function that generates the actual neuron output. The transfer 
function modifies the value of the output signal. This function’ 
can be either a simple threshold function that only produces 
output if the combined input is greater than the threshold value, 
or it can be a contmuous function that, changes the output based 
on the weight of the combined input.. When the signal reaches 
to the last node, an appropriate output is generated. This output 
when compared to the desired output gives the étror. Error 
during learning is called MSE. This error is back propagated to 
nodes to readjust the weights through adaptation learning 
function. The complete cycle is called an iteration and the set 
of inputs are called epoch. Many epochs are applied-to get the 
desired output and train the network. In training datà accuracy 
is vital for the development of an efficient model that can 
provide accurate prediction. Once the network is trained the 
same can be used for estimation and analytical purpose. The 
trained model is assumed to be successful if the model gives 
good results for that test ‘set. To msute that ANN models 
provide correct prediction or classifications, ‘the: prediction 
results produced by ANN models can be validated against 
ee ee eee eres 
against he raul of other computer programs : 


2.2. DESCRIPTION AND DESIGN OF ANN 

ANN modeling has been done to estimate electrical, thérmal, 
overall thermal and exergy efficiency from the arrangement 
shown in Fig. l(a) and (b) for single and double pass 
respectively. The ANN model uses ambient air temperature, 
number of sunshine hour, number of clear days, temperature 
coefficient, module efficiency, global and diffuse radiation as 
Input parameters and thermal, electrical, overall thermal and 
exergy efficiency as output parameters for the experimental 
setup is shown in Fig. 3 (a) and (b). 


Detérmination of Efficiency of Hybrid Photovoltaic Thermal Air Collectors Using Artificial Neural Network Approach for 


Different PV Technology 


. Fig.2 (a) and (b) ‘represents:the typical layout of the ANN, 
which shows the network nodes along with biases and weights 
for single and double pass respectively. The network: type is 
selected: as feed forward back propagation. The ANN model 
has ‘four-layer feed forward back propagation neural network 
architecture, input layer of seven neurons, two hidden layer of 


twenty and twenty five neurons for single and double pass air: 
collector respectively and an output layer of four neurons. The: 


hidden layers has ‘tan-sigmoid’ activation EUneHON ® defined 
by the logistic function as 

ø =1/1—e", where n is the corresponding input. 

For the output layer, a logsig activation function is used. The 
inputs'have been normalized in the (0, 1) range. A set of 2000 
epochs has been taken for training purpose. The MATLAB 
Neural Network Toolbox is used for the implementation of the 
feedforward network. The supervised training technique back 
propagation algorithm has been’ used. oo has been 
selected as training function and i 

MSE has’ been taken as the performance friction. This training 
function updates thé weights and bias values’ in accordance 
with LM optimization. In order to train’ the network the data of 
solar radiations for different climates: has been obtained from 
Indian Metrological Department, Pune. °- °°" | 

The following’ paraméters are set while training the feed 
forward neural netwotk: training pattern 2000, learning rate 
0.001,:MSE training ‘goal has been set as 0.005, number of 
training iterations 1250, momentum 0.94. The training patterns 
are presented repeatedly to the ANN model and the adjustment 
ig performed ‘after edch iteration whenever ‘the network’s 
computed output is different from the desired output. After 
several adjustments to the network parameters, the network 
converged to’ a threshold-of 0.00001 using hidden nodes. The 
accuracy of the- trained ANN model was validated using other 
sets of data, which are different from those used for the training 
process and the’ mean square error is 0.005 The RMSE varies 
from 0.0568-to 4.7633%' for different output parameters: The 
results demonstrate that the. ANN based model developed in 
this work can predict the efficiency at any’ point in time with 
high m 

3. THEORETICAL ANALYSIS OF AIR COLLECTOR 

3.1 SINGLE PASS AIR COLLECTOR 

The cross ‘sectional view of semi transparent single pass air 
collector has been ‘shown in Fig. 3(a).There is a provision of 
duct below the PV module. The air is passed through one end 
of the dict and gets warm by picking the thermal energy from 
the backside of the PV- module and exit from the other end of 
g aut ee duct = been insulated to minimize the heat loss. 


3.2 DOUBLE PASS AIR COLLECTOR: 

The cross sectional view of semi transparent hybrid PV or 
double pass collector, has been shown in Fig.. 3(b). There is 
provision for two ducts. The two ducts are connected in series 
at the end. The air flows’in the upper duct get exposed to the 
solar radiation. Due to exposure; the temperature of the air in 
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the outer duct increases. The heated air is circulated through 
the inner duct and gets further heated due to increase in the 
temperature of semi transparent PV module. Thus this useful 
thermal energy obtained, from hybrid PVT pass air collector 
can be used in building for space heating in cold climatic 
condition. 


3.3. THERMAL ENERGY AND EXERGY ANALYSIS 
The hourly rate of useful thermal energy of semi transparent 
hybrid PVT air collector is calculated as 


(g,)=m.C, T, -T,) (1) 
The daily thermal energy output in kWh of the semi i i 
hybrid PVT air collector can be expressed as 


> = (a) 


imj 


The A ER energy output in kWh of the semi 
transparent hybrid PVT air collector can be expressed as 


ged 


The annual thermal energy output can be evaluated by using 
hourly equation 


12 





q 

eee “x N (Ic) 
Q 2, T000 x N XN, 
The annual exergy of semi transparent hybrid PVT air collector 
is calculated as 

To + 273 as 
EX es = secon! [ee (2) 
£ T $ Ea 


The expression for outlet air temperature (T, ) in Eq. (2) is 


given by Kamthania et al. [17, 18]. 
The thermal efficiency of semi trańsparent hybrid PYT air 
collector can be expressed as 


na 
= : (3) 
ame 2 si bx L | 


3.4 ELECTRICAL AND EOCIVALENE THERMAL 


ANALYSIS 
The hourly electrical energy can be written as ` 
E =n, x Ax I(t) 7 (4) 
The annual electrical energy can be obtained as 
(Eu ana =Ma X AX L(t) sng x N xX (5) 


The temperature dependent electrical efficiency of PV — 
can be written as 


Na =l- pT -TA 6) 
where, Ms is 25°C (under Standard test condition) and value of 
cell efficiency (n,) and temperature coefficient (B) for different 
PV technology is given in Table 3. 


BVICAM'’s International Journal of Information Technology (BIJIT) 


The equivalent thermal energy can be calculated as 


E 
ai 0.38 
The 0.38 is the conversion factor from thermal to electrical 


energy for thermal power plants by Huang et al. [19]. 


3.5. OVERALL THERMAL ENERGY AND EXERGY 
"ANALYSIS 
The overall thermal energy can be obtained from Eqs. (1c) and 


(7) and is expressed as 





AA = O nmal T E rannal (8) 
The overall thermal efficiency of semi transparent hybrid PVT 
air collector can be expressed as 

Na 
=a t (9) 

TD eth TTli 0.38 
where, is calculated from Eq.(3) and from Eq.(6) 
3.6. OVERALL EXERGY 
The annual exergy can be obtained from Eqs. (2) and (5) 
EX perma = EX nornai +(E, anmai (10) 


The exergy efficiency of semi transparent hybrid PVT air 
collector can be expressed as 
Ta | ie PH 
= "| Te+273 
The overall exergy efficiency of semi transparent hybrid PYT 
air collector can be expressed as - 


lons “Ne + Mes (12) 
For more details, please refer the paper written by Kamthania 
et al. [17, 18]. ; 


(11) 


4. SYSTEM AND DATA COLLECTION 

The analytical model has been derived for the experiential 
setup installed at the Solar energy park of IIT, New Delhi as 
shown in Fig.3 (a) and (b). The equations are derived for the 
analytical model of various performance parameters (thermal, 
electrical, overall thermal and exergy efficiency) considering a, 
b, c and d type climatic conditions for different weather 
stations. The ambient air temperature, number of sunshine 
hours, number of clear days, temperature coefficient, cell 
efficiency, global and diffuse radiation have been used as input 
for training of ANN. Climatic data and results of four weather 
stations (Bangalore, Mumbai, Srinagar and > dhpur) have been 
used for training purpose and th- data of .:¢ fifth weather 
station (New Delhi) has been us: - for testing purpose of the 
ANN model. 


5. METHODOLOGY 

The ANN has been defined in MATLAB 7.1 neural network 
toolbox as per the above mentioned parameters. The initial 
values of the weights have been defined and an incremental 
input is given to the network for estimating the outputs. When 
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the outputs are closure to result matrix and the calculated MSE | 
is within specified limits the iterations are terminated and the ` 
values of weights are recorded. If the output matrix is close to 
desired results then the nétwork is trained otherwise the same 
procedure is repeated with new weight matrix. Thus value 
obtained through ANN model are compared with analytical 
result for New Delhi weather station. The RMSE deviation has 
been calculated using the following equation. 
2 
2 -%) | w (13) 


rn 


RMSE = | 
6. RESULTS AND DISCUSSION l 
ANN helps in analysis and estimation studies before putting the 
Solar project at place. The purpose of this study is to develop 
an ANN model for performance analysis of a semi transparent 
hybrid PVT single and double pass air collector for different 
PV technologies. The training cities have been chosen as 
Banglore, Mumbai, Srinagar and Jodhpur and test city is New 
Delhi. The performance parameters calculated from ANN, are 
compared with the results obtained from analytical study. 

Fig. 4 shows MSE curve for a typical iteration, the performance 
of the network has been shown against the goal set for the 
network. MSE has been taken as the performance function with 
MSE training goal set as 0.005. It has been observed that LM 
with 20 and 25 neurons in the hidden layer for single and 
double pass air collector respectively and 4 neurons in input 
and output layer is the most suitable algorithm with set MSE 
value for single and double pass air collector respectively. 

The RMSE measures the average magnitude of error. It is 
better to have lower RMSE values. The RMSE has been 
calculated using Eq.13. The RSME values of the performance 
parameters calculated from both ANN model and analytical 
study considering a, b c and d type weather conditions have 
been shown in Table 4. According to the results the deviation 
are in the range of 0.056-4.763% for different output 
parameters. It has been observed that RMSE for electrical 
efficiency, overall thermal efficiency and overall exergy 
efficiency varies from 0.056 to 0.211 %, 4.068 to 4.763% and 
0.298 to 0.580% respectively. 

Fig 5(a) shows monthly variation of electrical efficiency of 
single pass air collector for different PV technologies. The 
minimum electrical efficiency for different PV technology is in 
the month of May due to maximum solar radiation and 
minimum in the month of January due to low solar radiation. 
With the increase of solar radiation the cell temperature 
increases and there is decrease in the electrical efficiency of 
solar cell. The monthly electrical efficiency is maximum for 
HIT and minimum for a-Si. Fig 5(b) shows monthly variation 
of electrical efficiency of double pass air collector for different 
PV technologies. The monthly electrical efficiency is 
maximum for HIT and minimum for a-Si due to same reason as 
discussed in Fig. 5(a). 

Fig. 6, 7 and 8 shows deviations of various performance 
parameters for single and double pass air collector of different 
PV technology. It is observed that double pass air collector 
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have higher values as compared to single pass air collector. 
Further it is ‘also observed that that maximum value “of 
electrical, overal] thermal and overall exergy éfficiency are 
obtained for ‘HIT’ PV technology whereas minimum value of 
electrical, overall thermal and overall exergy efficiency have 
been obtained for ‘a-Si? type for New Delhi weather station. . 
The values obtained from ANN model ‘are very close to ‘the 


analytical values. 


J a‘a o’ 


CONCLUSION 

In this paper ANN models have been developed using 
MATLAB 7.1 neural :networks toolbox for’ performance 
analysis of a semi transparent hybrid PVT-. double pass air 
collector for different PV technology. The ANN model is based 
on feed forward back propagation algorithm with two hidden 
layer. The LM with 25.and 20 neurons for- single and double 
pass air collector respectively i in the hidden layer and 4 neurons 
in input and output layer is the most suitable algorithm with 
MSE value of 0.005. It has been observed that analytical and 
ANN model have fair agreement with RMSE value lower than 
5%. Further it is also observed that it is advantageous to use 
ANN as compared'.to' traditional method .due to - speed, 
simplicity and ability to learn from examples, 
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Fig.1(a) Input, output and hidden layers of ANN for single pess 


air collector. 


pass air collector. 





Fig. 2 (a). Typical arrangement of ANN for single pass air 
collector. 
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Fig. 2 (b). Typical arrangement of ANN for double pass air 
collector. 





Fig. Xa): Schematic diagram of hybrid photovoltaic thermal 
single pass air collector. 





Fig.3(b):Schematic diagram of a hybrid photovoltaic thermal 
double pass air collector i 
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Fig.4. MSE obtained in the training of the network. 
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Fig. S(b) Monthly variation of electrical efficiency of double Fig. 7 Annual variation of overall thermal efficiency of single 
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Table 2: Number of clear days fall in different weather condition for New Delhi weather station 


PV Technology Module efficiency 1, Temperature Coefficient B 
% 


acr! 
Ribbon cast Si ÇS) 












Cadmium telluride (Cé-Te) 690 0.0026 
Copper indium diselenide (CIS) 620 
Heterojunction with Intrinsic Thin layer (HIT) 


Table 3: Specification for various silicon and non silicon based PV modules. 






Copy Right © BIJIT — 2012 Vol. 4 No. 1 ISSN 0973 — 5658 


3 


m 


BVICAM’s International Journal of Information Technology (BIJIT) 


Amerp heus Nicen Cadmium Telherids Organic Based Dev ices 


- Dolar Cesnpcemtreter 
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Table 1: Classification of photovoltaic on the basis of PV technology 


Electrical Efficien Overall Thermal Efficien ‘Overall Exergy Efficiency 

DP | o DP | SP 

0.1874 45260 
41898 | 










r-SI 


ee | SP. — DP 
esl | 01874 4.5260 0.4125 
4.1898 
0.2939 
pPSi | 01610 | 0.1745 | 41632 | 45725 | 0.5553 | 0.3873 





Cd-Te 0.0675 4.7532 0.2984 
0.1280 0.1388 4.1314 4.6317 0.5312 0.3565 
0.1947 0.2110 4.1973 0.5809 0.4195 


Table'4: RMSE calculations of electrical, overall thermal and exergy efficiency for different PV technologies for single and 
double pass air collector ° 





Fig. 8 Annual variation of overall exergy of single and double pass air collector of different PV technology 
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Open Source Software Reliability Growth Model by Considering Change — Point 
V. B. Singh’, P. K. Kapur’ and Mashaallah Basirzadeh’ 


ABSTRACT 

The modeling technique for Software Reliability is reaching its 
prosperity. Software reliability growth models have been used 
extensively for closed source software. The design and 
development of open source software (OSS) is different from 
closed source software. We observed some basic 
characteristics for open source software like (i) more 
instructions execution and code coverage taking place with 
respect to time, (ii) release early, release often (iii) frequent 
addition of patches (iv) heterogeneity in fault density and effort 
expenditure (v) Frequent release activities seem to have 
changed the bug dynamics significantly (vi) Bug reporting on 
bug tracking system drastically increases and decreases. Due 
to this reason bug reported on bug tracking system keeps an 
irregular state and fluctuations. Therefore, fault 
detection/removal process can not be smooth and may be 
changed at some time point called change-point. In this paper, 
an instructions executed dependent software reliability growth 
model has been developed by considering change-point in 
order to cater diverse and huge user profile, irregular state of 
bug tracking system and heterogeneity in fault distribution. We 
have analyzed actual software failure count data to show 
numerical examples of software reliability assessment for the 
OSS. We also compare our model with the conventional in 
terms of goodness-of-fit for actual data. We have shown that 


the proposed model can assist improvement of quality for OSS 
systems developed under the open source project. 


Keywords: Open source software, reliability assessment, 
software reliability growth model, bug tracking system, 


change-point 


1. INTRODUCTION 

The advancement in the information technology has changed 
the dynamics of life and society as well as software 
development. It has added new dimensions like e-learning, e- 
conferencing, e commerce, e-meeting e-governance..., and 
the list is now becoming endless. Since the mid 1990s, there 
has been a surge of interest among academics‘and practitioner 
in open source software. The design and development of open 
source software is significantly different from that of 
proprietary software. Open source software is developed by 
community for community.The development of OSS is of 
interdisctplinary nature and needs knowledge and expertise 
from many scientific disciplines such as computer science, 
Management.and organization, social sciences, law, economics 
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growth of OSS quantitatively by measuring the remaining 
number of bugs in the software. The rest of the paper is 
organized as follows. Section A and B of introduction deals 
with literature review of OSS and change point problem in 
software reliability. In section 2I, we discuss modeling 
framework comprising notations, assumption and model 
development. Section 3 deals with model validation, numerical 
illustration and goodness of fit curves. Finally, section 4 deals 
with concluding remarks and future direction. 


A. OPEN SOURCE SOFTWARE WITH RELIABILITY 

The use of open source software is increasing rapidly and its 
role is becoming high in different domains ranging from 
commercial, educational, to research. According to Gartner’s 
report, about 80 percent of all commercial software will 


‘mclude elements of open source technology 2012 [24]. Open 


source was first evolved during 1970s. Richard Stallman, an 
American software developer, who believes that sharing source 
code and ideas is fundamental to freedom of speech, developed 
a free version of the widely used Unix operating system under 
GNU [5 and 25].The spirit of open source software is the free 
right of using, reproducing, distributing and modifying the 
software, which creates an efficient economical, productive 
software development model: establishing commercial projects 
through the concept of open source, implementing 
collaborative development through the open source community 
based on the network, allocating resources optimizedly, 
increasing the transparency of projects, and reducing the risk 
of development [6]. Eric Raymond, the main proponent and 
co-founder of the open source project, is generally credited 
with establishing the movement of OS through his seminal 
paper “The cathedral and.-the Bazar” [7] and attributed the 
open-source software development approach as: 

“Given enough eyeballs, all bugs are shallow.” (p. 41) 

A classification of users and developers and their role as 
shown in figure I has been discussed in [17]. 

The author in [18] discussed many Claims and counterclaims 
for open source software on the basis of number of factors 
including cost advantage, source code availability, maturity, 
vendor lock-in and external support. In the available literature, 
many papers address the issue of reliability for open source 
software qualitatively. Paper [19] proposes a number of 
hypotheses and tries to analyze the relationship between 
openness and reliability. A study has also been carried out on 
bug report data of open source project and it has been 
concluded that traditional software reliability growth models 


and psychology. In this paper, we measure the reliability can not apply to assess the reliability growth of open source 


‘Delhi College of Arts & Commerce, University of Delhi, Delhi 


‘singh vb@rediffmail.com, *pkkapurl @gmail.com and ’m_basir3 1@yahoo.com 


Copy Right © BIJIT — 2012 Vol. 4 No. 1 ISSN 0973 — 5658 


15 


BVICAM’s International Journal of Information Technology (BIJIT) 


software because the design and development of open source is 
different from that of closed source [1]. 
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Figure 1[17] 

A software reliability assessment method concerned with the 
software development environment of OSS has been discussed 
m [4]. It proposes software reliability assessment and 
optimization analysis method for OSS paradigm. Reliability 
growth models have been presented by considering user 
growth for open source software [23]. The paper also reveals 
that reliability growth curve of open source software is similar 
with that of closed source software by studying bug reported 


data from bug tracking system of software projects developed 
under open source environment. 


B. CHANGE POINT PROBLEM IN SOFTWARE 
RELIABILITY 
The fault detection rate may not be smooth and can be changed 
at some time moment T due to changes in defect density, 
testing strategies etc. called change-point. Many researchers 
have incorporated change point in software reliability growth 
modeling. Many researchers have incorporated change point in 
software reliability growth modeling for closed source 
software. Firstly, Zaho [20] incorporated change-point in 
software and hardware reliability. Huang et al. [14] used 
change-point in software reliability growth modeling with 
testing effort functions. The change-point problem in OSS has 
been introduced by Singh et al. [15]. Kapur et al. [2, 13] 
introduced various testing effort functions and testing effort 
coutrol incorporating change-point in software reliability 
growth modeling. Kapur et al. [10,11] proposed a software 


reliability growth modeling for a fielded has been proposed by 
Kapur et al. [9]. Later on SRGM based on stochastic 
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differential equations incorporating change-point concept has 


been proposed by Kapur et al. [12]. 


2. MODEL DESCRIPTION 

During middle and operational stage fault detection rate 
normally depends on other parameters such as execution rate 
of CPU instruction, code expansion or code coverage [21]. The 
success of OSS projects has been mostly attributed to the 
speed of development, reliability, portability and scalability of 
the resulting software. Recently, instructions executed 
dependént models have been proposed to measure reliability 
growth of open source software [23]. In this paper, we are 
considering number of instructions executed dependent 
software reliability growth model by considering change point 
for measuring the reliability growth for open source software.. 


(Ð) Notations 


m, mÀ : Expected number of faults identified in the time 

interval (0,1) l 

é,e(f) : Expected number of instructions executed on the 

software in the time interval (0,/] 

a. Constant, representing the number of faults lying dormant 

in the software. 

k, P : Constants 

b(t): Fault removal rate as a function of testing time 

(if) Assumptions 

Mathematical model, which can capture various types of 

growth patterns as the testing/debugging progresses, are 

proposed in this paper. 

The proposed model is based upon the following basic 

assumptions. 

1. Software failure phenomenon can be described by the 
Non-homogeneous Poisson Process (NHPP). 

2. The number of failures during testing is dependent upon 
the number of instructions executed. 

3. The number of instructions executed is a power function 
of testing time. 

4, The fault detection rate may be change at some time 
moment (called change-point). 

(iif) Modeling Framework 


Using the above assumptions, the failure phenomenon can be ` 


described with respect to time as follows in [8] 


an) t )_ dm(t) de(t) 

de(t) dt 
aa E regis ete E 
depends not only upon the number of faults remaining in the 
software but also on the proportion of faults already detected. 
Based on this assumption the differential equation for fauk 
identification / removal can be written as: 


dm(t) B m(t) 

de(t) -| Pa a) a 

Here Å, is the rate at which residual faults cause failure. It is a 
constant as each one of these faults has an equal probability of 


(1) 
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causing failure. A, is the rate at which additional faults are 


identifted without their causing any failure. 
Let the second component of expression (1) be defined as a 
power function of testing time Le. 


a0). kr" | (3) 
Substituting (2) and (3) in (1) we have: 
abe (tom) o 


It is a first order differential equation. Solving it with the initial 
condition (0) = 0 we get: 


(5) 





Here bk (kH) and JK, / ka) 

If we take Å= 0 in equation( 5), above model reduces to [3]. 
(iv) Proposed Model by Considering Change-Potat 

We can write differential equation for fault detection process 
Le. 


D = b(r)(a-mfe) | (6) 


and if we take b(t)= br 
] +! 
ER Pi” 
logistic function as follows in [26] and solving for equation 
(6), we get same solution as given în equation (5) 
Now by considering change in fault detection rate at change 
point t , we can write l 


for (st 
1+ Bexp| — ye) ) 
i fo i>r 
1+ Boxp rr 


Here, b; and b- are fault detection rates before and after change 


ie a power 


b(t)= 


point. 
The fault detection equation can be written as 

1+ Bexp| -a lia (8) 
dm (1) k+l 


i ——(a-m(1)) for i>r 
1+ exp “Fh 


After solving oquation (8), we get 
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I+ | ye) 


| 4 H 





bf 


k+l 


(9) 
if we take k~ 0, and Ø =0 , model reduces to [13] and [2] 


respectively 


3. MODEL VALIDATION 

To illustrate the estimation procedure and application of the 
SRGM (existing as well as proposed) we have carried out the 
data analysis of real software data set. 


a. Description of Datasets 

Data set 1(DS-1) 

We collected all failure data of Keepass software developed 
under open source environment (www.sourceforge.net) from 
19-Dec-03 to 27-Feb-07, 458 failures were observed. Keepass 
software is a password database utility. Users can keep their 
passwords securely encrypted on their computers. A single 
Safe Combination unlocks them all. From graphical view of 
data, we identify 19" month as change-point. 

Data set 2(DS-T) 

This data is cited from Fedora Core . Linux 
(http://fedora.redhat.com/ and [4]), which is one of the 
operating system developed under an open source project. We 
have taken data up to release 3 for model validation. During 
the course of 57 days 164 failure were observed. From 
graphical view of data, we identify 17° month as change-point. 
b. Comparison Criteria 

The performance of SRGM are judged by their ability to fit the 
past software fault data (goodness of fit) and predicting the 
future behavior of the fault. 

The Mear Square -Error (MSE) 

The model under comparison ts used to simulate the fault data, 
the difference between the expected values, 77(t;) and the 
observed data y, is measured by MSE as follows. 
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t 


; MSE =Y 0-2) S . . Parameter Estimates 
mio ; a 
where & is the number of observations. The lower MSE ; B 
indicates less fitting error, thus better goodness of fit [16]. i 
Coefficient of Multiple Determinations (R°) r 
We define this coefficient as ‘the ratio of the sum of squáres 
resulting fromthe trend model ‘to that from-gonstant model 
subtracted'from I. > ` 
ie Re residual SS- n i i 
corrected SS 


R? measures the percentage of the total variation about the 
mean &ccounted for the fitted curve. It ranges in value from 0 
to 1. Small values indicate that the mode! does not fit the data 
well. The larger R°, the better the model explains the variation 
in the data els j 

Soj a f oo RN 2 ' ` : 
Bias se 
The difference between the observation and prediction ‘of 
number of failures at any instant ‘of time: j-is’ known ag 
PEi.(prediction error). The average’ of PES is: known as bias. | 
Lower the valué of Bias’ better Te oone oan a” May 
Variation 

The standard deviation of prediction error is known as 
Variation = N-i (PE, - Bias |, -., ., 

' Lower the value ‘of Variation better’ is the ‘goodness of” fit ’ 
[22]. 
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Root Mean Square Prediction Error’ i Results DS-1) ` s 


It is a measure of closeness with which'a mòdẹl predicts the- nala eeo 
observation. SOON” SPS For DS-I Yafo ut w 


RMSPE = (Bias? +Yariation’) .... E o a 
Lower thè value of Root Mean Square: Prediction Error: 
better is the goodness of fit [22]. Da En r 


re * 


nr ts ity pre I HE HAR ei ee oe ee 


Goodness of fit curves 





à 
tæ oP ra 


» P? 


5: NUMERICAL RESULTS AND ANALYSIS 
The parameter estimation and ‘comparison ‘¢riteria ‘results for 
DS-land DS-II of all the models under consideration can be. 
viewed through Table I(a-b) and Table II(a-b) respectively. It 
is clear from the table that proposed model (equation 9) 

provides better goodness of fit for DS-I. and DS-ILThe ` 
proposéd model gives total failure, lat latent in software L.e, 467 
against obseryed i.o. 458 failures, means 9 bugs are still 


1 4 7 ha Ao a 








remaining in software and~ 181°: against ‘observed i.e. 164° Time(months) 

failures means 17 bugs‘dre still remaining’in software (a fairly’ = eS a 
reasonable estimate)For: DS-I and DS-II. It has been also met —— Actual —e— GO —>— ‘Yamada —— K-G —s— Proposed 
observed that GO model overestimates the value of parameter 0 A Ege of fit Uma e a aa 
“a”, 
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Figure2: Data set 1(DS1) 
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Figure3: Data set | ne DS-2 


CONCLUSION 

In this paper, we have proposed a sioi reliability model by 
considering huge user growth in case of open source software. 
The user growth is expressed in terms of number of 
instructions executed. The proposed model also incorporates 
change in fault detection rate due to drastic changes in reported 
bug on bug tracking system. Parameter estimates, comparison 


criteria results and goodness of fit curve has been also 
' presented in comparison with conventional models. But, there 


is a need to present the model in a form that is rienoly to the 


` software developers. 


In future, we will try to isco a general framework to 
measure reliability ‘growth of open source software by 


considering detection and correction process (bug reporting ` 


and bug pane) 
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ABSTRACT 

Faced with a global shortage of skilled health workers due to 
attrition, countries are struggling to build and maintain an 
optimum knowledge workforce in healthcare for delivering 
quality healthcare services. Forces that affect healthcare 
professional turnover needs to be addressed before a 
competent uniformly adoptable strategy could be proposed for 
mitigating the problem. In this study we investigate the effect 
of the socio —demographic characteristics on attrition of 
healthcare knowledge workforce in northern parts of India 
that have a wide gradient of rural and urban belt, taking into 
account both public and private healthcare organizations. 

For this purpose healthcare professional attrition tracking 
survey (HATS) was designed. The data has been collected from 
a random sample of 807 respondents consisting of doctors, 
murses, paramedics and administrators to explore the 
relationships between various factors acting as antecedents in 
affecting the Job satisfaction, commitment and intention of a 
healthcare professional to stay in the job. Structured 
questionnaires were utilized as the data collection tools. Both 
public and private healthcare organizations in urban and rural 
areas were covered for the survey. 

Descriptive statistics and factor analyses using analysis on 
Rotated Factor Matrix using Principal Components Analysts 
(PCA) in SPSS 16.0 package were carried out. Six factors of 
attrition namely Compensation and perks, Work Life Balance, 
Sense of Accomplishment, Work load leading to exhaustion, 
Need for automation and technology improvement, Break 
Monotony of Work have been identified as the main factors 
with a data reliability of 0.809%. Based on the survey 
response and analysis, a highly possible strategy of utilizing 
information technology implementation for Increasing worker 
motivation, job satisfaction and commitment to reduce 
attrition has been proposed. 


Keywords: Healthcare professional, healthcare information 
technology, attrition Job satisfaction, work-life balance. 


1. INTRODUCTION 

Health care industry relies a lot on advanced medical 
technology, but it is also a labor-intensive industry. In recent 
times there has been increase in healthcare costs and healthcare 
staff shortages leading to healthcare organizations undergoing 
changes [1,2]. Some of these changes have lead to increased 


performance expectations and efficiency leading to decrease in 
staff morale and increase in attrition[3-5]. In this paper, the 
terms “health care professionals” and “human resources for 
health” -are used interchangeably, comprise of doctors, nurses, 
paramedics, hospital administrators. Researchers have 
identified the effect of shortage of skilled workers in hospitals 
lead to high patient mortality, job dissatisfaction and 
burnout[6, 7]. The migration of health professionals has been 
debated to be one of the main reasons of attrition and has been 
the main focus of such studies [8, 9]. It has been argued that 


opportunities for professional training, higher salaries and 


“perks ‘and better living conditions act as “pull” factors, surplus 


production of health personnel, resultant unemployment, less 
attractive salary, stagnation or underemployment coupled with 
lack of infrastructure act as “push” factors for the youth to 
migrate. A number of strategies have been discussed to 
counteract migration [8-11]. Human resources management 
plays a significant role in retaining health care workers [12]. 

Health care industry relies a° lot on advanced medical’ 
technology, but it is also a labour-intensive industry. As the’ 
Indian healthcare industry experiences phenomena! growth, 
hospitals are moving forward towards excellence rather than 
survival and gearing up to fulfil the gaps in three key areas of 
people, process and technology. India is the one of the most 
populous country with larger population in rural areas[13] with 


_an estimated 27.5% of Indians still living below the poverty 


Ime and cannot afford the healthcare provided by private 
organizations due to cost and unreachable locations. Most of 
them utilize the public healthcare provided by the government 
organizations. In a recent survey of dichotomy existing in the 
utilization of private and public health services in India it 
emerged that a bias towards the use of private health services 
in spite of the earlier mentioned problems may be due to the 
view that public healthcare services are not -of good 
quality[ 14]. Part 

Even with greater number of health care professionals viz, 
doctors, nurses, pharmacists, paramedics getting trained the 
Indian healthcare sector is suffering from acute shortage of 
healthcare professionals and facilities delivering quality 
healthcare services to the citizens[15]. According to survey 
carried out in 2008-09, India has only around 85,000 doctors 
precticing modern medicine and 1.5 million nurses to serve its 
more than one billion population. It has 0.8 beds/ 1000 
population, and 0.6 doctors / 1000 population (lowest in the 
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world). This means 6 doctors per 10,000 patients with a 
doctor/ nurse ratio of 0.83 compared to china having 20. This 
large, disparity has indicates _ a high . attrition of knowledge 
workers in healthcare. 

_ Implementation and utilization of information eani m 
~ healthcare (commonly. identified as Health Information 
Technology or HIT) has proven to be of immense benefit like, 
improved patient care, reduced waste and inefficiency in 
services, reduction in adverse drug effects and medical errors 
etc [16-18]. Since healthcare professional job satisfaction also 
has important implications for quality healthcare delivery, the 
relationship between the use of HIT and physician career 
satisfaction should be probed. In an earlier small scaled study 
[19] it was determined that using more information 
technology was the strongest positive determinant of 
physicians’ being very satisfied with their careers. 
India has joined the bandwagon of mformation technology, 
adaptors and is one of the main global forerunners in this 
area{20,21]. A number of government policies and programs 
have been developed pertaining to use of healthcare 


information technology (HIT) to improve the quality of- 


healthcare delivery [22,23]. Major private hospitals- 
(corporate) and public hospitals. at state level have 
implemented hospital information systems for patient 
Management, employee management, inventory, pharmacy, 
laboratory etc, [24,25]. While there are articles that indicate 
there is greater danger of brain drain in the area of healthcare 
in India, there are no detailed studies that offer effective 
‘ retention strategies for reducing the attrition in Indian 
scenario. 
‘The aim of this paper is to develop a probable strategy that 
could use implementation of information technology as a 
probable strategy to reduce attrition. To achieve this objective, 
we use the data collected from doctors, nurses, paramedics and 
administrators form different public and private organizations 
both m rural and urban area. This is m contrast to the earlier 
studies where the sample belongs to homogenous groups or to 
identical location of work. 


2. MATERIALS AND METHODS 

Data for-this study came from the second round of the 
Healthcare Attrition Tracking Survey (HATS). HATS is a part 
of the multi-level study of the ongoing doctoral research 
program conducted to address these issues regarding attrition 
among healthcare professionals and to determine if 
implementation of Health Information Technology in hospitals 
and healthcare centres can work as an effective retention 
strategy in India HATS was conducted among a non 
homogeneous group of skilled healthcare professionals such as 
doctors, paramedics, administrative and managerial staff in 
public as well as private hospitals covering rural and urban 
regions of Northern India. The survey was designed based on a 
non-formal discussion with nearly 40 healthcare professionals 
who had participated in the International Conference on 
Medical Informatics held by Indian Association for Medical 
informatics (LAMTI) in Hyderabad, India (Nov. 2009). 
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In the first round pre-test, studies were conducted in five _ 


hospitals (minimum 100 beaded) one.each from the five states - 
.of Delhi, Haryana, Uttar Pradesh, Madhya Pradesh and Jammu 
, and Kashmir. This was followed by focus group discussions. 


Based on the results obtained from these an elaborate second 
round of data collection using a complex sampling design of 
40 Hospitals randomly selected -to yield a non-biased 
representative sample of healthcare workforce both in rural and 


urban areas led to the present paper. Out of the 2000 


respondents approached for the survey, data was collected 
from 807 respondents usińg the questionnaire tool developed 
by the authors and reviewed by the experts in the field. The 
major challenge faced was in obtaining the permission from 
the HR authorities to conduct the survey due to issues of 
transparency of the system and'HR policies. 

Each participant was screened to determine survey eligibility 
- based on the following criteria before filling the questionnaire: 
Criteria 1 (origin): Health care professionals should be of 
Indian origin. Criteria 2 (Completion of Training): 
- Respondents should have completed their training and licensed 
in India, Criteria 3 (Job Satisfaction): Respondents were 
initially questioned regarding their view on current job 
satisfaction. Those who responded “don’t know” or “tetas to 
answer” were excluded from the HATS survey. 

The selected respondents were provided eona that 
contained 60 questions that could provide insight to their job 
satisfaction, work environment, self development, supervisor 
relationship, reasons for leaving job, future plan, their practical 
knowledge, attitude and usage of HIT. The responses were 
recorded on a five-point Likert scale from | (strongly agree) to 
5 (strongly disagree), yes/ no option and as open — ended for 
inviting their views. 

Statistical Analysis: A random 5% sample of responses was 
checked for coding errors. Wherever the data was left 
uncompleted and unclear the respondents were approached 
individually to recollect the data. The Reliability Test on Data 
was 0.809%. Data were analyzed by means of Factor Analysis 
on. Rotated Factor Matrix using Principal Components 
Analysis (PCA) in SPSS 16.0 package to determine the 
relationships between factors influencing attrition. Descriptive 
statistics included percentage rates for categorical variables, 
means and standard deviations. The categorical variables 
considered were gender, marital status, age, education, work 
nature, location, organization type, work experience and 
income, Chi-square tests to find the associations between the 
reasons indicated for leaving a job and the number of 
respondents and t-tests to compare the contribution. of each 
categorical variable on the forces of attrition were performed. 
Descriptive statistics were performed to analyze the 
ee eee Sums ie Ane 20 ie eee eye 
respondents. 


3. RESULTS | 

The sample was predominantly male and the proportion ranged 
57.6 + 0.5%. The respondents were mostly’ middle-aged 
(52.1%) in the range 26 to 35 years and mostly married 
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(62.4%) living with family. Nearly 20% of the married 
respondents especially. male were living alone with their 
family in their respective home towns. Almost two-thirds of 
the participants “were doctors, paramedics, nurses, 
administrators who had less.than a year of practice in the 
current organization and also middle-aged. 54.7% of the 
participants were graduates while the were 
. 34.5%. Undergraduates were few (11.6%). Approximately “ 
nearly equal number of doctors and nurses, paramedics - 
participated while the administrators were less. There was not - 
much difference in the number: of participants, ‘based on their 
income. 
During the survey it was identified that many BO E had 
shifted job -within a-year and some have decided to do so 
within short period of time. Through open ‘ended questions the 
reasons for shifting and their future plan to shift were 
ascertained (Fig. 1). The effect of sociodemographic details 
on the responses were calculated and plotted in graphs. Chi- 
test was performed to ascertain the significance of these on 
migration. 
It was observed that gender, age, marital status, nature of the. 
work profile, work experience and wage to have significance ` 
with respect to the reasons for shifting job. The three mam 
` reasons identified were heavy work load, no social benefits 


and low pay structure. The distribution of the salary drawn by ` 


the respondents had greater significance with the reasons 
identified than other variables. Non- cooperative, boss and 
frequent transfers were also identified. 
The factor analysis on Rotated Factor Matrix has led to 6 
Factors of attrition as under: 

Fector 1; Compensation and Perks 

Fector 2 : Work Life Balance 

Factor 3 : Sense of Accomplishment. — 

Factor 4 : Work load leading to Exhaustion 

Factor 5: Need for Automation atid technology 

Improvement 

Factor 6 : Monotony of Work. 
Al the above six factors were compared with the 9 descriptive 
parameters indicated in Table L Only those that had a 
significance effect on the forces of attrition are described in 
detail m this paper. Gender, marital status, age and education 
did not contribute much. Time spent by a healthcare 
professional at an organization does contribute to the attrition. 
Two factors namely how the organization contributes to the 
work — personal life and extent of the work load seem to be the 
majot contributors. l 
Stress due to over workload was the main contributor when 
type of the healthcare organizations ie. private versus public 
was considered. Nature of the work of the respondents 
considered seems to throw significant contributions to attrition. 
Nearly 4 out of the 6 factors were affected. All the four factors 
namely, Compensation and Perks, Work -Life balance, Sense 
of accomplishment and Need for Automation and Technology 
all were significant at 0.01 level (Table II). 


Irrespective of the salary package five out of the six factors of 
attrition identified were significantly found to contribute to 
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attrition. Compensation and Perks need for implementing 
automation and technology all contributing to job satisfaction 
in terms of sense of accomplishment seem to be major 
affecting factors (Table IID. 

The proportion of respondents proposing to shift the existing 
job ‘within next few years was further investigated. The 
doctors‘ were more: prone to shift’ jobs compared to others 
- (Fig.2). It was determined the proportion of those who did not 
> plan to change ‘jobs in near future were more than those who 
~ had planned to “shift within near future. The male health 
professionals especially those who were married and health 
` professionals with low income packages were very much keen 
to change jobs. Also middle aged professionals were keener to 
shift with job satisfaction and salary being indicated as the 
prime reasons. 

The respondents were also tracked regarding their usage of 
HIT in order to determine their awareness and willingness to 
adopt HIT to increase the job efficiency (Fig 3 ). It was also 
observed that health professionals within minimum 
postgraduate education and those who were middle aged had 
greater computer awareness. 


4. DISCUSSION 
Results show a significant difference in attitudes towards 


factors affecting attrition. The results provide evidence to 


demonstrate that economic motivation as a factor for changing 
jobs is not an independent, stand-alone. factor in itself, but 
' rather a component of broader factors that takes into 
consideration the yearning to improvise both developments in 
both professional and personal front- 

The respondents were further questioned to ascertain the need 
of HIT in their work and their willingness to undergo further 
IT training. 80% of the respondents felt the need of 
implementing HIT to simplify their work and almost all of 
them were ready to undergo training with overall percentage of 
60% respondents being favourable..  - 

This finding is a departure from the previous studies that 
indicate the intention of healthcare professionals to frequently 
change jobs and migration to foreign countries are mainly 
dependent on remuneration [26,27]. 

Based on a broader framework of understanding derived from 
the results of this study, a number of inferences can be drawn 
relating to strategies to encourage retention. 

‘Factor 1 Compensation and perks that refers to providing 
incentives and extra income in terms of benefits need to be 
structured through contested policies of public and health 
sector reforms that would induce the health care workers to. 
contmue in the existing organization [11]. 

Factor 2 Work life balance depends on the nature of the. work, 
type of the workplace and issues in the workplace. Introducing 
strategies like flexible work options, specialized leave policies, 
paid maternal leave, paternal leave, etc. can increase the 
satisfaction level of the healthcare professionals. Doctors and 
administrators who spend greater time of the day in the 
hospital are. affected by work life balance issues. 
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Factor 3(Sense of accomplishment) is about job satisfaction 
felt by the healthcare workers. This does not depend upon the 
monetary issues and it deals with the sense of achievement and 
fulfilment felt by the employees. A key to build such a culture 
is by involving the medical staff members to make 
collaborative decisions in clinical and operational issues[ 28]. 
Factor 4 (Work load leading to exhaustion) and Factor 6 
(Break monotony of Work) refers to the overworked health 
care professionals. While this was not much of the problem in 
urban hospitals interviewed, it was more prominent in the rural 
areas, This is due to higher workloads, coverage of large 
geographic areas, lower access to specialists, and to a broad 
array of patients. This specifies the need to improve working 
conditions and the professional interface with other health 
professionals and society in the rural areas. Planned 
interventions could employ non-financial incentives such as 
recognition by management, performance review and 
improving inter-professional working relationships, to uphold 
and strengthen the professional ethos of health professionals 
[29]. 
Factor 5 (Need for Automation and Technology Improvement) 
implies the requirement of HIT implementation in the health 
care industry. The supply of good support, education and 
training is a key approach to attracting and retaining allied 
health practitioners, especially in rural locations[12,30]. HIT 
enables health care professionals to confidently access, 
interpret, and apply organisational knowledge, patient care 
procedures, professional workforce competencies, best practice 
knowledge and other skills information in a manner that 
improves patient satisfaction, achieves positive clinical 
outcomes, and maximises cost savings for the organisation 
[18,19]. In this present study irrespective of gender, age & 
education, location the importance of implementing HIT was 
stressed by almost all respondents. The nature of work done by 
respondents seems to play a significant role in assigning the 
need for automation and technology as a major factor of 
attrition. The doctors seemed to be the preferred users of 
computers, than the healthcare admmistrators and the nurses 
and paramedics. Also it was identified that the HIT usage was 
more prevalent in urban hospitals than in rural hospitals. 
Moreover, the difference in the salary does not seem to detract 
the fact that implementation of HIT was seen as a basic 
requirement of healthcare professionals, 
Based on the discussions with the respondents it was 
understood that the healthcare professionals leavze their jobs 
due to the greater job opportunities and higher pay packages in 
abroad. Attrition of post graduate doctors is seen to be in lure 
of attractive salary packages, better technologically equipped 
healthcare facilities besides higher studies. Medical 
professionals working in rural private health set ups found 
reasons for leaving their job in search of opportunities that not 
only provides good financial benefits but also better 
professional development through adoption of newer 
logies. Given the industry standard salary, they still 
ready to shift jobs to organizations that were endowed 
with advanced technologies of healthcare delivery. 
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Based on the observation the following it can be understood 


that for any hospital and health care system the planning of — 


manpower (human resources) is very vital[31]. Detailed 
planning of human resources and a plan of action for their 
selection, training and deployment are very important factors 
to be considered right from the project planning to 
implementation and should be undertaken at the inception of 
the project. 

Other than better salary packages and financial benefits, better 
work environment etc implementing HIT to reduce work load 
stress, enrich knowledge and core specialization, improve 
quality in service can work as an ideal strategy to increase job 
satisfaction of healthcare professionals thereby reducing 
attrition. This also reduces medical errors and increases quality 
in healthcare delivery [18,32]. Healthcare is rapidly becoming 
an interconnected ecosystem, with IT as its circulatory system. 
While the above strategies can be uniformly followed among 
all healthcare professidnals irrespective of their nature of work 
and location the following guidelines may be followed 
especially in India. Since all the processes of recruitment and 
selection are critica] and attrition rate of knowledge workers in 
Healthcare is significant, the healthcare industry should focus 
on employing right talent and develop the talent to increase 
retention in the organization for a longer period of time. 

A potential solution to bridge acute shortage of healthcare 
workers and reduce attrition rate is through providing 
accessibility to online healthcare, which has emerged as very 
important tool for offering healthcare services that can be 
accessed by patients across boundaries. Online healthcare 
connects patients and doctors via internet services. Online 
health portals can reduce workload and streamline processes 
for consultations, booking appointments, maintaining patient 
health records, getting second opinions, among various other 
services offered. 

Healthcare professionals must be provided financial help and 
resources to further their knowledge in the realm of HIT, 
mandatory practical exposure to using computer and internet 
etc. They should be offered mcentives to encourage them to 
use the technologies implemented. They should be made aware 
of the benefits that would increase by using computers to 
reduce their work load, increase quality of service etc. They 
should also be trained to use the technology to learn about 
guidelines, surf medical and health databases to retrieve vital 
information, to retrieve information from journals, e-books, to 
keep in touch with professional groups etc. Training should be 
provided to them to reduce the fear of increase in work 
complexity through the use of technology. 

Implementation of Technology and adoption of Healthcare 
Information Technology applications and best practices would 
result in simplifying processes. The benefits would be in terms 
of Unique Health Identification Number (UHID) for each 
patients, Electronic Medical Record (EMR), Telemedicine, 
Reduction in Physician Errors, Time Savings in processes such 
as information retrieval, Adoption of International Standards 
and best practices, Instant Availability of Administrative Data, 
increased Financial Savings and Clinical Trials & Research. 
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This in tum would bring in transparency in the system and 
healthier working conditions. Improved efficiency and 
profitability would lead to better employee compensation and 
working condition thereby leading to retention of knowledge 
workers in healthcare. 


ACKNOWLEDGEMENT 


The 


authors acknowledge all the respondents and 


administrative staff in over 40 hospitals for allowing 
conducting the survey. 


REFERENCES 


[1]. 


[2]. 


[3]. 


[4}. 


[5]. 


[6]. 


[10]. 


J. Buchan, “The 'greying' of the United Kingdom 
nursing workforce: implications for employment policy 
and practice", Journal of Advanced Nursing, vol. 30 (4), 
pp. 818, 1999. 

M. D. Leurer, G.Donnelly, and E.Domm, "Nurse 
retention strategies: advice from experienced registered 
nurses", Journali of Health Organization and 
Management, vol. 21(3), pp. 307-319, 2007. 

G.M. Acker, “The Effects of Organizational 
Conditions(Role Conflict, Role Ambiguity, 
Opportunities for Professional Development, and Social 
Support) on Job Satisfaction and Intention to Leave 
among Social Workers in Mental Health Care”, 
Community Mental Health Journal,40,65-73, 2004. 

C.S. Borrill, J. Carletta, A.J. Carter, J.F.Dawson, S. 
Garrod, A.Rees, et al, “The Effectiveness of Health 
Care Teams in the Nationa! Health Services”, Aston 
Centre for Health Service Organization Research, 
Birmingham, UK, 2001. 

H.Lu, A.E. While, and K.L.Barriball, “A model of job 
satisfaction of nurses: a reflection of nurses’ working 
lives in Mainland China”, Journal of Advanced Nursing, 
Vol 58 (5), pp 468—479, 2007. 

L. Aiken, S. Clarke and D. Sloane , “Hospital staffing, 
Organization and quality of care: cross national 
findings”, International Journal for quality in Health 
care, 14(1), pp 5-13, 2002. 

L. Aiken, S. Clarke, D, S.J. Sloane, and J. Shilber, 
“Hospital nurse staffing, patient mortality, nurse 
burnout and Job dissatisfaction”, JAMA, 288(16), 1987. 
B. Stilwell, K. Diallo, P. Zurn, M. Vujicic, O. Adams, 
and M. Dal Poz, “Migration of health-care workers from 
developing countries: strategic approaches to its 
management”, Bulletin of World Health Organization, 
vol. 82(8), pp.601, 2004. 

T.Wuliji , S. Carter , and I. Bates, “Migration as a form 
of workforce attrition: a nime-country study of 
pharmacists”, Human Resources for Health vol. 7, pp. 
1-32, 2009. 

A. Hagopian , M. J. Thompson, M. Fordyce, K. E. 
Johnson, and L. G. Hart, “The migration of physicians 
from sub-Saharan Africa to the United States of 
America: measures of the African brain drain”, Human 
Resources for Health, vol. 2, pp. 17, 2004. 


Copy Right © BIJIT — 2012 Vol. 4 No. 1 ISSN 0973 ~ 5658 


[11]. 


[12]. 


[13]. 
[14]. 


[15]. 
[16]. 


u7. 


[18]. 


[19]. 


[20]. 


[21]. 


[22]. 


[23]. 


[24]. 


[25]. 


L Mathauer, I. Imhoff , “ Health worker motivation in 


- Africa: the role of non-financial incentives and human 


resource management tools”, 
Health , vol. 4, pp.24, 2006. 
S. M. Kabene, C. Orchard, J. M. Howard, M .A 
.Soriano, and R. Leduc. “The importance of human 
resources management in health care: a global context “, 
Human Resources for Health, vol. 4(20), pp. 1-17. 
Deloitte- CII Report , Medical Technology Industry in 
India, July 2010. 

C. Kumar, R.Prakash, “Public-Private Dichotomy in 
Utilization of Health Care Services in India”, 
Consilience: The Journal of Sustainable Development 
Vol. 5(1). Pp. 25-52, 2011. 

Online Revolution- Delivering healthcare at doorstep, 
e-healthonline. 2010. 

E. Alberdi E et al, “Use of computer-aided detection 
(CAD) tools in screening mammography: a 
multidisciplinary investigation”, The British Journal of 
Radiology, vol. 78, pp. S31—S40, 2005.. 

H. Lerum, G. Ellingsen, and A. Faxvaag, “Effects of 
Scanning and Eliminating Paper-based Medical Records 
on Hospital Physicians' Clinical Work Practice 
American Medical Informatics Association”, vol. 10(6), 
pp.588-595, 2003. 

D. W. Bates et al., “ Reducing the Frequency of Errors 
in Medicine Using Information Technology”, Journal 
of American Medical Informatics Association., vol 8, 
pp. 299-308, 2001. 

M. Weiner, P. Biondich, “The Influence of Information 
Technology on Patient-Physician Relationships”, J. 
Gen Intern Med, vol. 21, pp. 835-39, 2006. 

N.Hanna, “Exploiting the Information Technologyfor 
Development”, World Bank Discussion paper 246, 
S.Sahay and S.Madon, “Geographic Information 
Systems for Development Planning in India: Challenges 
and Opportunities” in M.Odedra (ed), Information 
Technology and Socio-economic Development 
Opportunities and Challenges, Ivy League publishing, 
Pp 42-52. 

S K Mishra et al, “Design and Implementation of 
Telemedicine Network in a Sub Himalayan State of 
India”, Proceedings of 8th International Conference on 
e-Health Networking, Applications and Services, 
Healthcom 2006, IEEE, pp 78-83 2006, 

National Health Policy 2002, www.mohfw.nic.in/ 
NRHM/documents/National Health policy 2002.pdf. 
Last accessed on April 19, 2011 

M. Khandhar, “Health Management Information . Last 
accessed on April 19; 2011 System (HMIS)” in 
Compendium of E-governance: Initiatives in India, 
(eds) P. Gupta and R. K. Bagga , University Press, 
2008. 

K.Jagirdhar, “Srishti Software - Jayadeva Hospital 
HMIS - case study”, http://blogs.siliconindia.com/ Last 
accessed on April 19, 2011 


Human Resources for 


25 


BVICAM?’s International Journal of Information Technology (BIJFI) 
[26]. S. M Shortell, J. Schmittdel etal, “An Empirical - Figare2. Proportions of Respondents planning to shift jobs 
. Assessment of High- Performing Medical groups: tee . na l 

Results from`a National Study.”, Medical Care Research 
and Review., vol. 62(4), pp. 407-434, 2005 

[27]. K: C. Lun, “The Role of Information Technology in 
Healthcare Cost Containment”, Singapore Med. J, vol. 
36, pp.-32-34. 1995. ` 

[28]. K. Pillemer,. “ A higher éalling Choose nursing 
assistants carefully, train them well, and your turnover 
rates will dwindle”, Contemporary Long-Term cae 
vol. 20(4), pp. 50-2, 1 - 







a Doctors % 





60 l ‘ 
` w A es w 


o tated : 


ADE E 
[29]., N. Margolis, E. Booker, “Taming the healtheare cost i : 
monster” Computerworld, Vol. 192; pp. 261): 14-5. 
[30]. J. K. Young, “Quality care on a budget: Realizing ‘Figures. Proportions of Responders using IT 2 
benefits from clinical systems”, Computers -in a eG 
Healthcare, vol. 13, pp. 34-5, 1992. 4 






Serre Nn oe i” woy e o e 








[31]. B. Chaudhry et al., “Systematic Review: Impact of _ TABLE I Demographic details of the respondents ` 
| Health Information Technology on Quality, Efficiency, ; - 
and Costs of Medical.Care”, Ann Intern Med., vol. 144, , a |) eee 
pp. 742-752, 2006. qe “Cen a ae ot 
32]. E. Oren, E. R. Shaffer, and BJ. Guglielmo, “Impact of = nie ee ern 
an emerging technologies on medication errors and adverse. l 2 Age oe PE r i 
o Ung events’, American Journal of Health-System - . 17-25 ` 187% (O 150) 
= vol. 60(14), pp. 1447-1458, 2003. 26-35. « S521% ( AIT ) 
- 36h | 30% (240 >) 
3 i Marital Status a oe + 7 , 
Mamed _ + 624%" ( 49.) 
Unmarned 38 4% - 307 ) 
4 Work Experience 
Te <S years 0 761% (C 609° ) 
"> $ years 247% ( 198 ) 
$ Education - FS 
' undergraduate ` 116% ( %83 ) 
graduate 54.7% ( 43B) 
postgraduate 345% ( 276 ) 
6 Nature of Work eas = 
Doctors . 389% ( 312 ) y 
Nurses & paramedics - 37.1% ( 297 ) l 
Administrators 247%. ( 198 ) 
7 Income (Rs) 
ar ee gener Neutral 4. dates. Strongly upto 10,000 ‘205% ( 164 ) 
- Agree 10,000-20,000 189% ( BI ) 
Figurel. Reasons provided for shifting jobs within last | Se, Vee tae 
gn pro | BJ — 30,000-40,000 > 166%.( 13 ) 
>40,000 17.9%. ( 143 
ye 8 Type of Hospital _ i 
Public 392% ( 316 
a Within 2 Private 608% ( 491 j 
years 9 Location of Hospital ~ p 
@ Within 5 Urban Jf Ul, 720 ( 386°) 
years o a RA AR AL O, 
B Not going 2 ss 
Anywhere | 
Continued on Page No. 30 





e 
qa 


Copy Right © BIJIT — 2012 Vol. 4 No. 1 ISSN 0973 — 5658 | 26 


BVICAM’s International Journal of Information Technology (BUIT) 
Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi 


Optimization of Material Procurement Plan — A Database Oriented 
Decision Support System 


Shyamalesh Khan’ and Sanjay Kumar’ 


Submitted in April 2011; Accepted in January 2012 


ABSTRACT 

Recently, the business in steel has undergone a sea change. 
Customer requirements for steel products have become 
increasingly demanding in terms of better quality and 
specification. Globalization has thrown open the market to 
intense competition. The rise in raw material cost has put 
pressure on steel industry to optimize the procurement strategy 
time to time. It has become imperative to re-look at the 
production strategies and production costs to work out methods 
to address the market situations in a dynamic manner. 
Minimization of the production cost in a Blast Furnace (BF) in 
an integrated steel plant is a complex problem as it associates 
the quality, quantity, cost & freight of raw materials along with 
the production targets and present operating regimes. Coal 
forms a major source of cost in the entire gamut of Iron & Steel 
production. Coal quality has a direct bearing on the BF 
productivity and the final cost of hot metal. Steel Plants 
procure coal based on the quality requirements, coal 
availability and linkages with coal sources. This paper is based 
on application software that deals with the procurement of coal 
based on optimization techniques integrated with heuristic and 
statistical models. 

The software has been developed using C programming 
language, Oracle Developer tools as front end and oracle 
database as backend. This is an excellent tool for finalizing 
coal procurement plan, with a view to minimize hot metal cost 
and achieve desired coke quality at minimum cost. The 
software can be also be utilized for assessing the effect of 
Rupee / Dollar parity, effect of quality of any individual coal 
etc. on coal procurement plan and hot metal cost. 


Keywords: BF: Blast Furnace, M10: Micum 10 index, DSS: 
Decision Support System 


1. INTRODUCTION 

Hot metal production process in an integrated steel] plant is 
shown in Fig. 1. Coke made from coal blend at coke oven, iron 
ore, sinter and other burden materials are used in blast furnace 
for production of hot metal. Coke is the most important raw 
material fed into the blast furnace in terms of its effect on blast 
furnace operation and hot metal quality. It is well known that 
use of superior quality coke in blast furnace results in 
improvements in productivity and coke rate. Coke is produced 
from coal which is procured from different indigenous sources 
as well as sources abroad. These coals have different qualities 
like ash, volatile materials etc. 


Other 
Inpats 
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Figure1: Schematic of hot metal production process 


Customers today have different options for selecting the best. 
They are looking for cheaper steel with better quality. This 
necessitates the manufacturers to have production at optimum 
cost. Coal accounts for significant part of hot metal cost. - 
Selection of coal for integrated steel plant is complex as it is 
governed by several logistics. One has to ensure lower ash 
content as well as the ranking of the coal suitable for bearing 
burden load in blast furnace operation. Most economic blend of 
Primary Coking Coal (PCC) and Medium Coking Coal (MCC) 
suitable for operation is the prime objective. 

At first glance it seems that more use of cheaper coal such as 
indigenous coal shall bé economical. However, it is not so. The 
developed application software helps in decision making for 
selecting the right combination of coals from different sources 
in order to minimize the hot metal production cost in a steel 
plant. The software simulates different operating scenarios at 
the plant, then weaves them to arrive at coal procurement plans 
and finally optimizes to get minimum cost hot metal solutions 
for a plant for a given range of target productions and 
operational constraints. This can also be used in techno- 
economic evaluation of new coals including imported coals." 
Steel producers have been using different technologies to 
produce iron at minimum cost by minimizing the coal 
procurement cost as well as by using proper blend of coals 
suitable for blast furnace production. The scope of this 
Decision Support System (DSS) software is limited only to 
minimum cost hot metal production. It covers the steel plant 
operation up to blast furnace and basically studies the effect of 
coal quality & cost on the hot metal cost. It simulates various 
operating scenarios to arrive at a minimum cost hot meta! 
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solution by integrating different models namely Blast Furnace 
model, Coal Blending model and Coal Distribution model. 


2. OBJECTIVE 

Optimization of raw materials procurement at corporate level 
for different plants from different sources is a complex one. 
There can be number of sources with varying qualities as well 
as costs and transport logistics. Supply from all of these 
sources may be possible to all the plants. There are various 
constraints that need to be addressed before finalization of any 
procurement plan e.g. limited supply from the source, the coal 
quality variation, logistics of supply and quantity required at 
each unit to meet its production target. Only human experience 
and judgment is used to arrive at a coal distribution plan in 
many cases now-a-days. 

The development of this decision support system was done 
with an aim to: 

e Fairly accurately predict the input and output costs 

e Optimized coal procurement and hot metal solution 

e Provide alternative choices to help decision making 


3. APPROACH 

Lot of work has been done in establishing the relationships 
between coal parameters and coke parameters. The coke 
quality parameters considered are Micum 10 (M10), coke ash 
etc and the coal parameters considered are coal ash, volatile 
materials etc. Though many techniques have been tried in blast 
furnace, statistical techniques have been found to be more 
successful compared to other models for the process. This is 
due to the fact that blast furnace is a multi variable process 
such as blast temperature, blast pressure, blast volume, oxygen 
enrichment, steam, top temperature, top pressure, above burden 
temperature, charging sequence, burden and many others. 
Effect and relation of these variants one complexity of the 
process is still not well understood, +! 


Standard linear programming falls to optimize a function where 
the parameters involved exhibit a non-linear relationship.” In 
this software a combination of process model and linear 
programming method has been used. The process models are 
basically statistical and heuristic models. The models helped in 
working out the minimum cost hot metal production through 
optimization of total coal cost. The system consists of the 


following main components!!! 

e Blast Furnace Production Rule Model 

e Coal Blend Model 

e Coal Distribution Model 

e Decision Support System (DSS) Simulator & Interface 


Cost of hot metal production is the sum of coke cost, blast 
furnace operational cost, blast furnace burden cost, fixed cost, 
freights, interest & depreciation and the blast furnace gas. 
Burden cost consists of cost of iron ore, sinter and other burden 
materials. The coke cost is sum of landed coal cost, fixed cost, 
interest & depreciation, operational cost of coke oven and the 
returns from the by-products. Landed coal cost in the plant is 
sum of two components — basic coal cost and freight charges 
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incurred to transport the coal to plant. Coal is the major source 
of cost in stee! plant. It is approximately 55-65% of hot metal ` 
cost. M" In case of scarcity of proper coking coal within the 
country, coal is imported. This involves foreign exchange 
component too. To solve this issue, software programs have 
been written and a database oriented software approach has 
been adopted for coal procurement from indigenous as well as 
imported sources for optimum solution. 


3.1. BLAST FURNACE (BF) 
MODEL 

Coke is the major input that affects the performance of a blast 
furnace. It is difficult to predict the effect of coke quality on _ 
blast furnace productivity and coke rate. Based on the 
experience, the working rules were defined and validated by 
the blast furnace experts from different steel plant™!. The 
working rule predicts the coke quality in terms of M10 and 
coke ash requirements for targeted productivity. The working 
rules were validated using production data. Results from 
working rules for blast furnace are found to be satisfactory and 
reliable.) The working rules are written in PL/SQL and the 
data generated is stored in Oracle®, 


PRODUCTION RULE 


3.2. COAL BLEND MODEL & #40) . 

Coal blend model defines the relationship between coal quality 
parameters with coke quality parameters. The model considers 
only the measurable and regularly monitored parameters like 
volatile matter and ash content. It uses volatile matter (VM) 
and ash for coal blending while M10 and coke ash for coke 
property. The aim is to decide the coal blend quality parameter 
for specified coke quality. The program for this model has been 
written in C programming language. The equations are 
established for each steel plant separately based on the plant 
operating practices and the technological regimes. 


3.3. COAL DISTRIBUTION MODEL 

Coal distribution model is based on the optimization 
program", It generates the minimum cost coal linkage plan 
subject to the desired coal blend quality. The model 
incorporates the various constraints as follows: 

Coal availability 

Coal quality 

Coal quantity requirements 

Coal quality requirements 

Coal transport linkage 

Imported coal requirement limit 

Coal type wise requirement 

The model also uses various cost parameters like basic coal 
cost, freight cost and rupee/dollar parity for calculating the 
minimum cost coal linkage plan. As per the BF model different 
coke quality parameters combination may give similar hot 
metal production. Also different coal blend may yield same 
coke quality. Under such situation, cases are evaluated on 
economic scale. This module is written in Pro*C and this is 
closely coupled with blast furnace and coal blend models!!, | 
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3.4. DSS SIMULATOR AND INTERFACE 

The DSS engine simulates various operation practices for a 

minimum cost hot metal solution by integrating the three 

models viz, blast furnace model, coal blending model and coal 

distribution model. DSS simulator provides for! l; 

e Definition of present operating scenario 

e Simulation of different operating scenarios 

e Application of blast furnace model to work out coke 
quality & quantity requirements 

e Application of coal blending model for blend needs 

e Integration of various cost, currency, conversion rate 
parameters and coal distribution model 

e Working out of a least cost solution for each scenario 

e Cost optimization for different production range 

This module provides for suitable GUI (Graphics User 

Interface) to facilitate entry of operational base data, coal 

sources, constraints, cost figures for coal and other burden 

materials, interest depreciation, operating cost, freight charges 

and currency conversion rate et . It also provides for 

interfaces to see the details of different outputs in user-friendly 

manner. All GUIs have been developed using Oracle 

Developer tools.¥! 


4.SCHEME 

The application software has been developed for minimization 

of cost of hot metal at company level through optimized 

procurement plan for coal. To validate the model with plant 

operating data, it was felt necessary to carry the optimization 

for one plant at a time and later on integrate for the whole 

company. The different inputs to this system are: 

e Present operating parameters e.g., hot metal production, 
productivity, coke rate, coke ash etc. 

e Blast Furnace volume, working days etc. 

e Coal source, type, quality, supply constraints, freight, cost, 
transportation loss, handling loss etc. 

e Blast furnace coke yield i.e., coal to coke ratio 

e Coal usage constraints by source, imported coal usage, 
quality constraints 

e Operational fixed costs for coke oven & blast furnace 
Currency conversion rate 

The model calculates different costs for the given range of hot 

metal production in small steps. It outputs the coal cost, coke 

cost, variable cost and total cost of production". For each 

production level, it gives the coal procurement plan from 

different sources at minimum cost. Model displays different 

kinds of trend graphs such as hot metal productivity vs. coal 

cost per ton of hot metal, hot metal productivity vs. coke cost 

per ton of hot metal, hot metal productivity vs. variable cost per 

ton of hot metal, hot metal productivity vs. hot metal cost per 

ton etc. Subsequently one can also generate trend graphs for 

imported coal %age vs. coal cost / coke cost / variable cost / 

hot metal cost. These graphs are useful for country like India 

where the good quality coking coal is scarce and there is 

always a need to import the same. 
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RESULT & DISCUSSION 

Coal accounts for significant part of hot metal cost. Selection 
of coal for integrated steel plant is complex as it is governed by 
several logistics. The most economical coal procurement plan 
differs with change in coal cost, transportation cost as well as 
the currency conversion rate. Considering all these factors, 
decision making for the best option is tough. 

Triais were conducted with real life data for Bhilai Stee! Plant, 
SAIL. The simulation model revealed that minimum hot metal 
cost/ton was achieved at x, level of productivity corresponding 
to x,’ % of imported coal. Coal from captive mines of SAIL 
was shown to be used to its maximum capacity at the minimum 
hot metal cost/ton. Model indicated higher use of imported coal 
will lead to higher level of productivity but hot metal cost/ton 
will shoot up. This was on account of coal from captive mines 
getting replaced by porga coal. The cost difference between 
the two was quite high.” 


CONCLUSION 

The developed software is an excellent tool for finalizing coal 
procurement plan with a view to minimize hot metal cost and 
achieve desired coke quality at minimum cost. It can also be 
utilized for assessing the effect of Rupee/Dollar parity, quality 
of individual! coal on coal procurement plan and hot metal cost. 


FUTURE SCOPE 

Coal Blending and Blast Furnace models can be further refined. 

Rule based part of the model may be replaced by the equations 

governing the process. The system can be used in any steel 

industry with tuning of model parameters used as per the plant 

practices. 
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TABLE O Comparison of factors of attrition with nature of work group 
W 1 — Modica! Professionals W2- Nurses and Paramedics W3 — Administrators = DUNCAN’S MEAN TEST 


Factors of Attrition Wi Wi W2 F- Value 
VA VA VAs 
W2 W3 W3 


Compensation and Perks 


Work life balance 
Sense of sccomp|ltshiment 


Work load leading to 





2.94 3 2.92 51 2.92 .62 - - 
NS : NotStgnificant * Significant at 0.05 level ** Significant at 6.01 leve 





TABLE Vil Comparison of factors of attrition with Income group 
Tl = Upto Rs 10,000/-, 2.=Rs.11—20,000/-, 13 = R321 —30,000/-, 14 R3.31 —40,000/-, 15 = MORE THAN RS.40,000/- ) - DUNCAN’S MEAN TEST 


[2 Vs 13 
I3Vs 15 
13 Vs M 
ll ¥s B 


Il Vs 15 
I3 Vs 4 
1] Vs B 


14 Vs I5 
n Ys 15 
Il Vs 15 
Il Vs B 
13 Vs 14 
2 Vs 4 
Il Vs 14 





INS : Not Significant * Significant at 0.05 level  ** Stgnlficand at 0.01 level 
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ABSTRACT 

24 Hour Knowledge Factory [1] is the work culture that 
incorporates different people contributing together in 
collaborated manner on various modules of the same project. 
But as advancements occurred, it was found that the approach 
is as difficult to realize as it is to imagine. The smooth work 
flow amidst the personnel demands attention. This paper 
discusses a software solution to easily implement this idea by 
designing a workflow system between the programmers who 
are working in the different places in 24-Hour realm. The 
software presents the user interfaces to enable an employee to 
grasp the work done until now easily. The interface creates 
optimized tables generated using the rough set theory. This 
theory gives us a fair view of the work required by providing 
lower and upper approximation along with various rules that 
could help us to find these optimum sets. Software also 
facilitates the developer at the immediate next shift to be sure 
of the code in which he is going to work. 


Keywords: 24 Hour Knowledge Factory; Workflow Design, 
Rough set; Upper Approximation; Lower Approximation; 24- 
Hour Development; Follow the Sun 


1. INTRODUCTION 


24 hour knowledge factory may be considered as a process of 


working shifts at different places which are not only 
geographically distant but also temporally far from each other 
[i]. It involves collaboration of three or more centers in 
different time zones handing over work to each other in shifts. 
The centers are connected using internet or dedicated networks 
which are used to pass knowledge from one work location to 
other. Each center completes its work in its given time and then 
the work is handed over to another center which has the day 
time corresponding to this center's night period. This is 
practiced until whole 24 hour cycle is completed. 

The concept of 24 hour Knowledge factory is not new. The 
work to improvise it is now for more than a decade old now. 
The past work are summarized in [1,13,14] carrying different 
perspective towards the problem. All have discussed the 
problem of bringing it to life very effectively. The commercial 
products based on “Follow the sun” alias 24 hour knowledge 
Factory were also introduced by IBM and HP in market [15]. 
For the effective utilization of sequential workers distributed 
across time-zones, tasks must be broken down so that they 
require no interaction with peers, In addition, effort required in 
transitioning from one employee to the next should be minimal. 


This paradigm requires new methodologies and tools that will 
allow an individual to understand m 16 minutes, the work done 
by others in the preceding 16 hours [7]. Also, this model 
requires introduction of time and state as search parameters for 
knowledge discovery in order to enable individuals to 
understand the sequence of changes being made in a project. 
These requirements of a 24 hour knowledge factory can be 
solved with the introduction of a Composite Personae (CP). 





Bengsiore 
(19:30-21:80 1ST} (5:30-18:50 15T) 


Figure 1: Cycle in 24 Hour Knowledge Factory 
[Adopted from Ref. 1] 


A CP is a highly cohesive micro-team that posses simultaneous 
properties of both an individual and a collection of individuals. 
It is designed to act like a singular entity even though it 
comprises of three or more individuals at multiple sites. In a 
CP, only one site is active at a single point in time. Thus, 
development can proceed in a manner similar to traditional one, 
with the difference being that a component is owned by a CP 
and not by an individual. It is actively involved in the process 
of development and conflict resolution on a round-the-clock 
basis [1]. 

The present day operating systems do not have adequate 
support to facilitate this concept. Application of rough set 
theory gives us an idea about the work that has to be done by 
providing the lower and the upper approximation and various 
rules that could help to find these optimum tables. The paper 
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consists of five sections ahead defining motivation for this 
software, revisiting rough set theory, implementation of rough 
set to 24 hour knowledge factory, future work and conclusion. 


2. MOTIVATION FOR SOFTWARE 

In analogous to the code generation in shifts, consider three 
people working on same wall building in temporal shift. The 
specification for construction of wall indicates that each brick 
is to be horizontally placed in the wall. Person A does his work 
efficiently by placing brick horizontally and finishes one third 
of the wall at the end of the shift. Now, another person B is 
handed over the work in next shift who inefficiently places the 
brick vertically and delivers the same work to person C at the 
end of his shift, person C is unaware of the inefficient work 
done by Person B and continuous to do his own work 
efficiently by placing the bricks horizontally. At this-stage, if 
this work is handed over to person A in the next cycle, how can 
he be sure if the work done until now has been efficiently 
done? 

Likewise, in the code building, each company has some norms 
to use e.g. ‘if then else’, ‘else if, ‘if then’ statements in code 
building. Wrong practices of using these commands by the 
developer in their code without following the protocols make 
the code partially inefficient for the company’s norms. This 
inefficiency can be checked by 24 hour knowledge factory and 
the developer at the next shift can be sure of the code in which 
he is going to invest time in. Possibly, also find out which 
developer delivered the inefficient work through history log. 
One may argue that Concurrent Versions System (CVS) [3] 
and knowledge factory, both work on the basic idea of 
maintaining the history of database of the same projects with 
the temporal differences but there lies a considerable amount of 
difference in the processing and maintaining the relevant 
information in the database. 

CVS just acts as a repository of information whether it’s a code 
or it’s an author of the code, whereas the knowledge factory 
maintains the ‘knowledge’ that is the relevant and useful 
information only. Knowledge factory incorporates the software 
maintaining this useful knowledge instead of acting as a 
repository and saving all the information available. 

Knowledge factory is capable of eliminating the redundant data 
provided by the user and retrieving it at the time of the need 
where as CVS just retrieves the data stored previously without 
processing it. Knowledge factory can process the code written 
and can differentiate its basic components like classes, 
modules, basic elements etc. where as CVS is not capable of 
this at all. 


3. ROUGH SET THEORY 

Rough set theory has been used in many applications varying 
from fault diagnosis to economic predictions [2,4,6]. It 
basically gives the crisp idea of selecting and deselecting the 
component of the entity based on its lower and upper 
approximation. Likewise, one can decide over the lower and 
upper approximation of the given entity in the case of 
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vagueness and uncertainty. The classical Rough Set Theory 
was introduced by Zdzistaw I. Pawlak [5,10,11] in 1982. 

a. INDISCERNIBILITY 

Let A’= (U; A) be an information system, then with any B © A 
there is associated an equivalence relation IND,-(B): 


IND,«(B)={(xx’) € U? | Va E B a(x) = a(x’) } 


where IND,(B) is called the B- Indiscernibility relation. If (x, 
x’) € IND, (B), then objects x and x’ are indiscernible from 
each other by attributes from B. The equivalence classes of the 
B- Indiscernibility relation are denoted [x]p 


b. SET APPROXIMATION 
fB S A and X & U. We can approximate X from the 
information contained in B by constructing the B-lower and B- 
upper approximations of X, denoted BX and BX respectively, 
where 
BX = {x|[x]p S X} 
BX = {x | [x]s N X #9} 

The accuracy of the rough-set representation of the set X is 
defined as:- _ 

3 H(X) = [BX| / [BX| 
The accuracy of the rough set representation of X, tip(X), 0< 
Up(X) <1, is the ratio of the number of objects which can be 
placed in X to the number of objects that can possibly be 
placed in X. 


ec REDUCTS 

From an information system, some attributes can be deleted 
while keeping necessary attributes. The least minimal subset of 
attributes which ensures the same quality of classification as 
the set of all attributes is called a reduct in A’. Intersection of 
all reducts is called the core. The core is a collection of the 
most significant attributes for the classification in the system. 


d. RULE GENERATION 

Rules represent extracted knowledge, which can be used when 
classifying new objects. Rules are created from the condition 
attribute values of the object class. They are presented in the 
form if “IF else” statement. A decision part comprises the 
resulting part of the rule. Rules that have same conditions but 
different decisions are called inconsistent rules. 


4. ROUGH SET THEORY IN 24 HOUR KNOWLEDGE 
FACTORY 

24 hour knowledge factory require the better infrastructure and 
workflow design for providing the interface to the programmer 
joining in second shift. This knowledge-rich workflow 
environments [16,17] uses rough set approach [8,9] which 
provides the needed information in exact and intelligent way in 
the compact form. It is implemented using the C# win forms 
application. The part of class diagram can be seen in Appendix 
I, We have made various assumptions regarding our project and 
these are as stated below: 


e All the Hardware supports of different places are assumed 
to be equal. 
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e There should be a predefined format for the security 
password and Login IDs. These can be programmer’s ID. 

e Current project Software requirement specifications are to 
be provided for the assessment of various data fields. The 
various upper and lower approximations of the SRS have 
to be matched against those obtained when the work is 
passed to the module developed, at the end of a particular 
shift. 


e SRS must specify the number of objects, classes, functions 
and number of modules in the project. 

The User needs to login for accessing the knowledge factory. 
This can be integrated with the corporate accounts login in later 
times. User may be able to work on three different sides viz, 
developer, and documenter and tester side. 

The win forms for one side of application is the developer side. 
By working on the developer side we maintain a form which 
enable us to view three different tables [12] which are 
necessary from the developer’s point of view that should be 
provided to the other developer working at the another shift of 
same project. Various tables that were designed are 
information metric table, history log table and modular table. 
Modular table tell us the name of module, approach 
maintained, function name, date of start and completion as also 
the language hardware and software used for designing the 
particular module have been used. Later the rough set approach 
is used to provide just useful information to the other developer 
which is mandatory for him to know and proceed further for his 
task. If we do not use rough set approach than it will result in 
lot of waste of time in reading a part of work done by one 
developer and deciding what next is to be done. 








a ‘108, . 
we table izi 
Figure 2: Design Hierarchy 


Beside this, when our current approximations in rough sets are 
compared with the required specified approximations (provided 
by SRS), the developer ahead is more confident of work done 
by previous developer. 


5. IMPLEMENTATION 

We have considered various classes which will have various 

objects and functions which are to be considered during 

attribute generation. Thus, we are considering two different 

way of attribute generation- 

e Object of various classes may be considered to create 
attribute table 


Copy Right © BIJIT —- 2012 Vol. 4 No.1 ISSN 0973 — 5658 


e Function in the various classes. 

Therefore, we have considered two different tables in our 
project. They will be filled in explicitly by the programmer for 
the time being. Now, various rough set rules can be used to find 
the optimum set attributes and then the upper approximation 
and the lower approximation will be found with help of rough 
set rules. 


6. COMPARISON WITH THE SRS 
It is essential to have software requirement specifications 
which show the desired set of classes, functions and objects. 
The Information given by the SRS will be used to find the 
desired lower approximation and upper approximation of the 
project. 
Suppose the Lower Approximation and Upper Approximation 
of the SRS is given by Ls and Us and their current code is 
given by the Le and Uc respectively. Then the calculation will 
be 
e Ls should be equal to Le. 
e Us should be equal to Uc 
It-may not be the exactly equal until the code is complete. Le 
and Uc will be some percent of Ls and Us. This will be given 
by — 
Current lower error= Le/Ls x 100 
Current upper error= Uc/Us x 100 


This error should lie between minimum given range. This value 
can be used to keep the track of code being written in the 
discrete time domains and it will still assure the programmer 
writing the current code, that he is following the correct code 
which was written by previous programmer. 


7. PROJECT WINDOW FORMS 

Three point of views in 24 hour knowledge factory are 
considered as there are three main strata of people which are 
involved in the software project generation i.e. Developer, 
tester and documenter. We concentrated on code developer 
view point in this paper. 

The developer view includes three Information Metric table, 
Module table and History Log to store the information about 
the code developed. The Information Metric Table stores 
general information about the module developed (figure3), The 
Module Table stores the specific information about the module 
developed (figure4). The History Log stores date and time 
details of the module developed (figureS). 

The data is collected by each developer at the end of his shift in 
these three tables. This data is stored in the database which is 
maintained at the backend. Now, depending upon the decision 
attributes Lower Approximation (LA) and Upper 
Approximation (UA) are found by using the algorithm for 
approximation computation [8]. The sets which are 
indiscernible can be reduced to single tuple and optimize the 
subsets to a certain degree. The approximation is specifically 
calculated to match the Lower and Upper approximation with 
those stated in the SRS (Software Requirement Specification). 
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Figure 3: Information Table 





Figure 4: Module Table 


Similarly, reducts can be found by applying the reduct and core 
computation algorithm [8]. It is a relative reduct that contain 
same amount of information that is held with non reduced data 
set. Hence, we can call it as extracted data. This is data which 
will be shown to the developer working at the immediate next 
shift to brief him about the status of the work done. 
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FUTURE WORK 
The event of login can be connected to a database maintained 
specifically for verification of user name and ord. 


Currently a predefined user name and password are being used 
for verification of login. 

The Software Requirement Specification is an essential input in 
any project. In the case where SRS tends to change, due to the 
market requirement changes, or any other reason, the whole of 
the input changes which tends to disturb the output in an 
unexpected way. And at worst could hamper the work progress 
which is the main goal of this 24 hour knowledge factory. This 
could be changed by making some amendments and predefined 
norms at the time of SRS agreement. Or a system which is 
ready to accept the changes made in SRS and is not that depend 
on it. 

Currently only the developer’s side of view has been 
considered. The other two sides i.e. the testers and the 
documenter’s also need to be implemented. 






Figure 5: History Table 


CONCLUSION 

The goal of this paper is to generate interface for managing the 
24 hour knowledge factory by implementing rough set theory. 
The software is capable of handling work different places 
which are not only geographically distant but also temporally 
far from each other by easily grasping the idea as quickly as 
possible by going through the optimised tables. Software also 
facilitates the developer at the immediate next shift to be sure 
of the code in which he is going to work. 
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ABSTRACT 

The present study was aimed to test constituents as well as 
complete theories of | Transactional and Transformational 
Leadership behaviour of salesman on customer relationship 
marketing behaviour in Indian Banking scenario. For 
Transactional Leadership it was hypothesized that contingency 
reward system and management by exception of salesperson 
positively affect customer trust, customer commitment and 
together they contribute to customer relationship behaviour 
.For transformational Leadership it was hypothesized that 
idealized influence behaviour, individualized considerate, 
Intellectual stimulation, Inspirational motivation behaviour of 
salespersons positively affect customers’ trust, customers’ 
commitment, customer assumptions and customers optimistic 
engagement. Non-Probabilistic sampling methods were used. A 
survey was conducted among 61 sales persons and their 
customers in the Indian banking sector, and the regression 
analysis was performed to test hypotheses. Conclusion shows 
that contingency reward system influence customer 
relationship up to a certain extent while management by 
exceptions is not so appropriate for maintaining the 
relationship with customer though it is showing correlation, 
while in case of transformational leadership idealized 
influence behaviour of salespersons positively influences 
customer trust, individualized consideration of salespersons, in 


turn influences customer commitment, Intellectual stimulation: 


encourage creativity and changes earlier assumptions of 
customer and Inspirational Motivation influences optimistic 
engagement of customers. It was also found that the combined 
effect of all the constituent of Transformational Leadership 
theories are positively related with customers’ relationship 
commitment. Conclusion motivate us to think complementary 
nature of these theories thus points out how leadership 
development training can be adapted to improve relationship 
marketing skills of sales persons. 


Keywords: Transactional Leadership, Contingency Reward, 
Management by exception, Transformational leadership, 
Individualized consideration, Intellectual stimulation, 
Inspirational motivation, Idealized influence behaviour, 
Customer Relationship marketing, customer trust, customer 
commitment, assumptions of customers, optimistic engagement 
of customers Sales force, Business-to-Business marketing, 


Banking, India. 


1. INTRODUCTION 

Transformational leadership is relevant, desired and brings 
positive change in the followers, it enhances the motivation, 
morale and performance, through his or her idealized 
influence, intellectual stimulation, individual consideration and 
Inspirational motivation. Bass added to the initial concepts of 
Burns (1978) to help explain how transformational leadership 
could be measured, as well as how it impacts follower 
motivation and performance [1], [2]. The followers of such a 
leader feel trust, commitment, admiration, inspiration, loyalty 
and respect for the leader. James Mac Gregor Burns (1978) 
first introduced the concept of transformmg leadership” [2]. 
Burns established two concepts: "transforming leadership" and 
“transactional leadership." Transformational and Transactional 
(Bass & Avolio, 1991) are all dependent on perceptions [3]. 

A transactional leader align much on a series of 
“transactions”. This person is interested in looking out for 
oneself, having exchange benefits with subordinates and clarify 
a sense of duty with rewards and punishments to reach goals- 
BASS BM [Il]. Bass suggested that leadership can 
simultaneously display transformational and transactional 
leadership. Years of research and number of meta-analyses 
have shown that transformational and transactional leadership 
positively predicts a wide variety of performance outcomes 
including individual, group and organizational level variables ( 
Bass & Bass 2008, The Bass Handbook of Leadership: Theory, 
Research, and Managerial Applications" 4th edition Free 
Press) [4].Charismatic and transformational leadership models 
have attracted considerable research attention (Conger & 
Kanungo, 1987) [5]. The banking sector plays relevant and 
dynamic role in the economic development of a country by 
acting as centre of interest and barometer of the financial 
system. Liberalization of the Indian Banking sector in the early 
1990s resulted the emergence of new horizons which gave the 
dynamism to the market and enhanced customer expectations. 
Indian banks need to adopt and implement innovative 
relationship marketing strategies to maintain the competitive 
edge in the marketplace. How are Banks misled by an over- 
reliance on technology and confusion regarding leadership 
roles? Leadership roles are not constraints with manager 
behaviour. The salesperson perspective requires holistic 
understanding of customer perceptions and leadership strategy 
for multi-faceted relationships. There has not‘ been any 
significant attempt merge transformational leadership and 
marketing in spite of major outcomes of transformational 
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leadership even if so than also full transformational and 
transactional leadership styles were not tested at least in Indian 
Banking scenario. In Business-to-business marketing (BTB) , 
the commutation ally includes both the selling organization and 
the individual salespersons in BTB (Doney and Cannon, 1997) 
[6]. Meta-analyses have shown that it is significantly related to 
important effectiveness dimensions, e.g., higher performance 
ratings, enhanced innovativeness, etc, and more importantly in 
BTB services, Egri and Herman (2000) [7]. 


2. AIM OF THE STUDY 

Objective of the study concentrated on the impact of 
transactional leadership bebaviour and transformational 
leadership behaviour of sales person on customer relationship 
in Indian Banking sector, considering the transactional and 
transformational leadership aspects, _leader-follower; 
salesperson-individual customer relationship and ,customer 
relationship marketing behaviour. Study examines whether the 
transactional leadership behaviour of individual salesperson 
influence customers trust in salesperson and their relationship 
commitment with the salesperson in the Indian Banking sector; 
and whether transformational leadership behaviour of 


individual salesperson influence customer trust, customer: 


commitment, assumptions of customer, optimistic engagement 
of customer and their relationship commitment with the 
salesperson. After knowing the impact of both theories it can 
be implemented as training module and enhance customer 
relationship innovative marketing in Indian Banking Sector. 


3. REVIEW OF RELATED LITERATURE 

Literature relevant with Transformational Leadership, 
Transactional Leadership and customer relationship theories in 
the popular press and scholarly work is vast and continues to 
expand progressively because of its usage, diversity and 
implementation by academicians and researchers. Not only is 
the literature vast, it is often scattered. The arrangement of the 
literature review is as follows: Customer Trust, Customer 
commitment, Customer Assumptions, Customer Optimistic 
Engagement, Contmgency Reward, Management By 
Exception, Individualized Consideration, Intellectual 
Stimulation, Inspirational Motivation and Idealized Influence. 


3.1 CUSTOMER TRUST 

Constructing trust among salespeople and their customers has 
traditionally been considered a relevant element in developing 
and maintaining a successful sales relationship. 

Trust is considered as a strategic variable in current marketing 
(Selnes, 1998) [8]. Trust and commitment- are the key 
variables of relationship marketing (Morgan and Hunt, 1994 
[9]. Such assumption is nothing more than a trust credit 
suggested to others before experience can provide a more 
rational interpretation (Gefen et al .2003,p62) [10]. Some of 
the items of CT- “Disclose my financial secrets which may 
help my salesman to make my credit decisions.” “Ignore the 
bad word of mouth (talking negatively) about my salesman.” 
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3.2 CUSTOMER COMMITMENT 

Commitment is an vital ingredient for successful long-term 
relationships. Rather than routinely trying to meet or exceed 
every customer wildest expectations, sales departments began 
studying customer habits. The trick was to reward customer 
loyalty, Customers and their salesperson tend to believe that 
long-term relationships are a decisive source for competitive 
advantages (e.g., Ganesan 1994) [11].The outcomes for the 
customer of such long-term orientation (Anderson and Weitz 
1992) are committed relationships that improve quality and 
process performance as well as increasmg access to valued 
resources and technologies [12]. Considerable research has 
been done in order to illuminate the correlation of social 
aspects in business relationships such as commitment, 
satisfaction, long-term orientation, dependence and trust 
(Garbarino and Johnson 1999) [13]. Taking views into account, 
customer commitment was measured by items — “going beyond 
the business relationship with my CRO in order to maintain the 
business relationship with him/her”, “appreciating my CRO’s 
work to his/her colleagues”. Some of the items of CC— 
“Recommending my salesman to my business colleagues for 
their dealings.” “Ask any problem any time from salesman 
without any hesitation” 


3.3 CUSTOMER ASSUMPTIONS 

Whether it's in regard to our sales efforts, during a discussion, 
or when trying to uncover ways to best manage customer, 
certain assumptions can dramatically affect the results we seek 
to achieve. This is especially true for research purposes. When 
chents ask for help in closing more sales, ask them to list the 
objections they hear that prevent the sale. It's when they start 
stumbling over their response that I ask, "Are these the 
objections you are hearing directly from your prospects or 
what you're assuming as the reason why they don't buy?" 
Rather than uncovering the real barrier to the sale, assuming 
where the objection lies becomes a detrimental process that 
spreads like a virus throughout every sales call. These 
assumptions are not based on fact but rather the salesperson's 
assumption of the truth. The problem arises when the 
salesperson fails to invest the time to go beyond the obvious 
and to explore the prospect's specific objectives or concerns. 
Thinking they “know” this prospect, the salesperson provides 
them with the benefits of his service that he perceives to be 
important, without considering the prospect particular. Some of 
the suggestions to create more selling opportunities. Some of 
the items of CA- “Whether services offered by salesman are 
beneficial in nature , comparison with other products.” 
“Whether salesman is able to change the perceived 
assumptions of customer.” 


3.4 CUSTOMER OPTIMISTIC ENGAGMENT 

Disposed to take a favourable view of events or conditions and, 
expect the most favourable outcome. The impact of 
transformational leadership styles on followers' effectiveness 
and motivation has also been documented (Bass & Avolio, 
1990) [14]. A tendency to expect the best possible outcome or 
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dwell on the most hopeful aspects of a situation: "There is a 

touch of optimism in every worry about one's own moral 

cleanliness" (Victoria Ocampo) doctrine asserted that this 

world is the best of all possible worlds [15]. 

e to come nearer in position, time, quality, character, etc., to 
(someone or something). 

e to make advances to, as with a proposal, suggestion, etc. 

e a means adopted in tackling a problem, job of work, etc. 
ideas or actions intended to deal with a problem or 
situation; “his approach to every problem is to draw up a 
list of pros and cons"; "an attack on inflation"; "his plan of 
attack was misguided" 

e access: a way of entering or leaving; “he took a wrong turn 
on the access to the bridge." 

Some of the items of CP- “Whether customer feels that future 

plans of banks are feasible and possible as communicated by 

salesman.” “Whether customer engage himself with future 
plans of banks as told by salesman.” 


3.5 CONTINGENT REWARD 

As many academics have pointed out, while researchers have 
learned a great deal about the effects of contingent reward 
(CR) leader behaviour, relatively little is known about its 
genesis. CR is traditionally viewed as an independent variable 
which exerts influence. The final phase in the creation of a 
customer service training program should be reward (Kerr & 
Slocum, 1987; Schein, 1985) it suggests that reward systems 
may work like a layer-cake[16], [17]. 


3.6 ANAGEMENT BY EXCEPTION 

Management by Exception is a "policy by which management 
devotes its time to investigating only those situations in which 
actual results differ significantly from planned results. The idea 
is that management should spend its valuable time 
concentrating on the more important items (such as shaping the 
company's future strategic course). Attention is given only to 
material deviations requiring investigation." 


3.7 INDIVIDUALIZED CONSIDERATION 

Recent empirical evidence indicates that individualized 
consideration is an important leadership behavior in the 
workplace (Sarros, Gray, & Densten, 2003) [18]. Bass (2000) 
identified a developmental orientation and individualized 
attention to followers as important aspects of individualized 
consideration [19]. It pay special attention to each mdividual’s 
needs for achievement and growth (Hinkin and Tracey,1999) 
[20]. Some of the items of ICARE “Salesman treat as 
individual rather than just as any other group 
member/customer.” “Salesman spends time for individual 
customer queries and problems and tries to give best of his 
services.” 


3.8 INTELLECTUAL STIMULATION 

Intellectual stimulation is defined as having a leader who 
stimulate and applaud innovatjof and creativity, as well as 
critical thinking and problem-solving. Still another research 
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initiative conducted by Hetland and Sandal in 2003 regarding 
177 subordinates and superiors of mid-level Norwegian 
managers in five different healthcare organizations showed a 
positive correlation in their application of the intellectual 
stimulation factor, which was defined as where a leader 
articulates new ideas that prompt followers to rethink 
conventional practice and thinking [21]. Some of the items of 
IS- Salesman re-examine critical assumptions to question 
“whether they are appropriate.” “Salesman seek different 
perspective when solving problems.” 


3.9 INSPIRATIONAL MOTIVATION 

Leaders with inspirational! motivation justify followers with 
high standards, communicate optimism about future goals, and 
provide scope for the task at hand. Followers need to have a 
strong sense of purpose if they are to be motivated to act. 
Purpose and meaning provide the energy that drives a group 
forward. This is an example of inspirational motivational 
leadership, which is part of the full-range or 
transformationalAransactional leadership model espoused by 
Bums beginning in 1978 ([2}. Intrinsically motivated 
salespeople seek peer recognition and put the organizations and 
the customers before their own interests (Kunz and Pfaff, 
2002) [22]. Some of the items of IM- “Salesman articulate a 
compelling vision of the services provided and its future 
benefits.” “Salesman talks optimistically -about future and 
requirements for that.” 


3.16 IDEALIZED INFLUENCE 

Provides a role model for high ethical behaviour, in stills pride, 

gains respect and trust. 

e Comprehensive vision-“] believe that this is truly the right 
thing to do.” 

e General characteristics. 

- Respects, trusts, and demonstrates confidence. 

Idealized influence refers to the behaviour characterized by 

self-confidence, determination, persistence, high competency 

and willingness to take risks. Some of the items of H- 

“Salesman go beyond self interest for the good of the 

customer.” “Salesman acts in way that builds customer respect 

for him.” 


3.11 TRANSACTIONAL LEADERSHIP (HYPOTHESES 
AND ITS BACKGROUND) 
Some thinkers may be interested in knowing how transactional 
leadership as a whole influences followers relationship 
commitment, others provide some evidence on the impact of 
individual components of transactional leadership. Particularly 
transactional leader concentrate on the exchange process 
whereby leader secure the effort of followers through the use 
of desired incentives (Bass and Avolio, 1990) [14]. These 
incentives are usually offered as contingent-reinforcement or 
management-by-exception (Avolid and Bass al, 1991) [3]. 
Also, transactions may be denoted as either active or passive 
between leader and follower (Hater & Bass, 1988) [23]. 
Logically, a transactional leader is more likely to implement 
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only those service-oriented behaviours that are in the spirit of 
the activities one would expect a transactional leader to 
undertake. Like for salesman discount and various schemes. 
H1:The higher the Contingent Rewards of salesperson, the 
higher will be the customer trust in that salesperson. 

H2:The higher the Management By Exception of a 
salesperson, the higher will be the customer commitment to 
that salesperson. 

Berry (1995) stresses that attracting new customers should be 
viewed only as an intermediate step in the marketing process 
[24]. He proposed relationship marketing as attracting, 
creating, maintaining and in multi service organizations- 
enhancing customer relationships. Berry’s notion of customer 
relationship management resembles that of Gummesson and 
Evert (1981) [24], [25]. Armstrong and Seng, 2000) identify 
trust as an antecedent of commitment [26]. The commitment- 
trust theory of relationship marketing by Morgan and Hunt 
(1994) also proposes certain variables that contribute to 
achievement of trust and commitment [9]. 

H3:The higher the Customer Trust in a salesperson, the higher 
will be customer commitment with the salesperson. 

We can state that both Management By Exception of a 
salesperson and customer trust affect customer commitment, 
the following hypotheses is concluded: 

H4:The effect of Contingent Reward and Management By 
Exception(Transactional Leadership) behaviour of salesperson 
is relevant and positive on customer relationship commitment. 


3.12. TRANSFORMATIONAL LEADERSHIP 

A transformational leader would be thriving in getting a 
change plan implemented by intellectually stimulating the 
followers (Bass, 2000), that will motivate them to rethink old 
ways of doing business [19]. Empirical tests of the 
extraordinary effects of transformational leaders on followers 
have become known as tests of the ‘augmentation hypothesis’ 
(Bass, 1985; Hater & Bass, 1989)[19] ,[23]. Theoretical and 
empirical research has identified the relevant role that 
employee behaviours play in the formation of customers’ 
quality perceptions and loyalty behaviours Maxham, 
Netemeyer and Lichtenstein, 2008;) [27]. It shows the 
pathway. Successful and innovative marketers move beyond 
physical connections of the product, price, place, promotion to 
psychological connections. Some of the successful cases— The 
Container Store, and Harley-Davidson. Their business models 
cater to employee and customer emotions, making them great 
companies. Connected companies replace business transactions 
with superior human interactions (Szymanski.M.David) [28]. 
Salesperson are constituents of society and if they show 
individualized consideration for customer than certainly it will 
increase customer commitment which ultimately leads to 
enhance relationship. 

H5:The higher the idealized influence behaviour of a 
salesperson, the higher will be the customer trust in that 
salesperson. 
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H6:The higher the individualized considerate behaviour of a 
salesperson the higher will be the customer commitment to that 
salesperson. 

H7:The higher the intellectual stimulation of a salesperson the 
higher will be the change in the assumption of customer. 
H8:The higher the inspirational motivation of salesperson, the 
higher will be the optimistic engagement of customer. 

H9:The mutual effect of Idealized Influence, Individualized 
considerate behaviour, Intellectual Stimulation and 
Inspirational Motivation (Transformational Leadership) of 
salesperson is relevant and positive on the customer 
relationship commitment(customer trust, customer 
commitment, customer assumption and customer optimized 


engagement). 
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Figure 1 

4. RESEARCH METHODOLOGY: 

Questionnaire was used to collect the primary data to answer 
the research questions and objectives regarding customers’ 
perception on Leadership behaviour of salesman in India. 
Study focuses on the causal relationship, thus the questionnaire 
method is more appropriate. Second, a valid and reliable 
measure of transformational leadership is readily available in 
questionnaire form. 


5. POPULATION, SAMPLE, AND SUBJECTS 

A total of 110 questionnaires were distributed to banks 
operating in the India. The number of questionnaires delivered 
to each bank was determined by the size of its customer 
database. From the questionnaire distributed, seventy 
questionnaires were collected, of which 9 were excluded due to 
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incomplete data, Non probabilistic sampling techniques were 
used. Items selected to measure customer trust, customer 
commitment, customer assumptions and optimistic engagement 
of customers (four statements per each) from the existing 
measurements, and selected relevant items of Multifactor 
Leadership Questionnaire (MLQ-5x) to measure four 
dimensions of transactional leadership and four dimensions of 
transformational leadership. The questionnaire included a total 
of 40 items. Items were discussed with some senior bankers 
with experience in corporate banking with the aim of 
improving the content validity of the measurements. Some of 
the items were customize, so that customer can correlate the 
statements. All the variables of the present study were 
measured on a 4 point scale anchored by “0” indicating “not at 
all” and “4” indicating “frequently if not always”. 


6. ANALYSIS AND FINDINGS 
The analysis was performed in SPSS version 17. The 
correlation table shows relevant correlation between variables. 


6.1 RELIABILITY ANALYSIS 

To assess the reliability and internal consistency of the data, 
the Cronbach alpha test-was performed. Variables exceeded the 
value of 0.6 and concluded as reliable for the stud 


Group Name Questions Cronbch 
pha 
7022 














Contingency Reward | Q-1,2,3,4 0.6491 

-5,6,7,8 0.7748 

Individulized Consd | Q-9,10,11,12 0.6710 
Intelectul Stimultn | Q-13,14,15,16 
0.6103 | 


0.6103 


Tableti 


6.2 RELATIONSHIP ANALYSES 
In order to understand the correlation between the 
transformational leadership, transactional leadership, Customer 
relationship commitment, the matrix of correlation coefficients 
were shown in Table 2. A higher coefficient indicates a 
stronger correlation between variables. 


6.3 CONTINGENCY REWARD AND CUSTOMER 
TRUST 

One of the most important fact is to verify the relationship 

between contingency reward and trust. Table 2 displays 

positive and significant correlations between contingency 
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reward and customer trust. These results provide support for a 
consistent positive relationship between contingency reward 
and customer trust, therefore suggest that Hypothesis 1 is 
supported. Hypothesis 1, that contingency Reward is 
positively correlated with customer trust, is supported. 


Pearson Correlation 


Sig 2tail 
Customer Trust 
Pearson Correlt (Sig 
2 tai 





6.4 MANAGEMENT BY EXCEPTION AND CUSTOMER 
COMMITMENT 
Hypothesis 2 stated that the Management By Exception 
Behaviour of salesperson is positively correlated with 
Customer Commitment. Management By Exception and 
Customer Commitment are positively related, was supported in 
the findings. Table 2 indicates that there was a significant 
positive correlation between Management By Exception and 
customer commitment. However, this correlation was 
moderate, r = 0.26, p< 0.01. Consequently, Hypothesis 2 is 
supported. 


Management by | Customer 
Exception Commitment 
an 







6.5 CUSTOMER 
COMMITMENT 

The third and central hypotheses, that customer trust and 

customer commitment are positively related, was supported in 

the findings. There was a significant positive correlation 

between customer trust and customer commitment shown in 

table 2. Correlation suggest r = .60. Hence Hypotheses 3 was 

rted. 


AND CUSTOMER 


TRUST 


Pearson Correlation 
(Sig 2tail 


Customer 
Commitment 


Table 4 : 
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6.6 TRANSACTIONAL LEADERSHIP 
CUSTOMER RELATIONSHIP 
Fourth and one of the most important hypotheses, that 
contingency reward, management by exception, customer trust 
and customer commitment are positively correlated. Results 
shows the relevance for the consistence positive relationship 
between the variables of transactional leadership(contingency 
reward, management by exception) and customer relationship 
ane Seer oe trust, customer Fr shown in 


AND 


ae rae] 
Cor 2 tail 





6.7 IDEALIZED INFLUENCE AND CUSTOMER TRUST 
Hypotheses 5 stated that there exist a positive and significant 
relationship between Idealized Influence and Customer Trust. 
Table 2 justified positive and significant relationship between 
Idealized Influence and Customer Trust. - 

Hypotheses 5 that there exists a positive and significant 
ore between Idealized Influence and Customer Trust is 





Table 6 


6.8 NDIVIDUALIZED CONSIDERATION 
CUSTOMER COMMITMENT 
Hypothesis 6 stated that the Individualized Consideration 
Behaviour of salesperson is positively correlated with 
Customer Commitment. Individualized Consideration and 
Customer Commitment are positively related, was supported in 
the findmgs. Table 2 indicates that there was a significant 


AND 
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positive correlation between Individualized Consideration and 
Customer Commitment. This correlation was relevant, r = 
0.59. Consequently, Hypothesis 6 is supported. 


Customer Individualized 
Commitment | Consideration 
















Individualized 
Consideration 
Pearson Correlation 


Table 7 


6.9 INTELLECTUAL STIMULATION AND CUSTOMER 
ASSUMPTION 
One of the most important relationship that needs to be verified 
is the relationship between contingency reward and trust. Table 
2 displays positive and significant correlations between 
Intellectual Stimulation and Customer Assumption. Correlation 
was relevant r= 0.53. These results provide support for a 
consistent positive relationship between Intellectual 
Stimulation and Customer Assumption, therefore suggest that 
Hypothesis 7 is supported. Hypothesis 7, that Intellectual 
stimulation is positively correlated with customer assumption, 
Intellectual | Customer 
notion 


Stimulation | Ass 





6.10 INSPIRED MOTIVATION 
ENGAGEMENT 

Hypothesis 8 stated that the Inspired Motivation Behaviour of 
salesperson is positively correlated with Optimized 
Engagement of customer. Inspired Motivation and Optimized 
Engagement are positively related, was supported in the 
findings. Table 2 indicates that there was a significant positive 
correlation between Inspired Motivation and Optimized 
Engagement. This correlation was relevant, r = 0.58. 

Consequently, Hypothesis 8 is supported. 


2 
Optimized | Inspired 
ngagement | Motivation 


AND OPTIMIZED 
















-| Inspired Motivation 
Pearson Correlation 
(Sig 2tail) 
Table 9 
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6.11 TRANSFORMATIONAL LEADERSHIP AND 
CUSTOMER RELATIONSHIP 

Hypotheses 9 stated that there exist a positive and significant 
relationship between Transformational Leadership (Idealized 
Influence, Individualized Consideration, Intellectual 
stimulation, Inspired Motivation) and Customer Relationship 
(customer trust, customer commitment, customer assumption, 
optimized engagement). Table 2 justified positive and 
significant relationship between Transformational Leadership 
and Customer Relationship. 

Hypotheses 9 that there exists a positive and significant 
relationship between Transformational Leadership and 
qora Reprene Bane 





141 
wit 


Table 10 
IM-Inspirational Motivation, OE-Optmized CT- Customer Trust, 
CM- Customer Commitment, I- Idealized Influence, IC- Indrvidualized 


Consideration, IS- Intellectual Stimulation, CA~Customer Assumption. 
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7. MANAGERIAL IMPLICATION 

Research tries to correlate leadership prospects of salesman 
and customer relationship behaviour. By scrutinizing each and 
every aspect of transactional leadership behaviour and 
transformational leadership behaviour of salesman on customer 
behaviour relationship, it can be concluded that training 
module programme can enhance the effectiveness of salesman 
customer relationship. In India, relationship marketing includes 
policies and procedure which are very basic in nature. In fact in 
some banks customer have to deal with delay and ineffective 
services. Customer relationship is just used as buzz word, there 
is no feedback and does not sufficiently recognize the 
salesperson as a potential means of implementing best 
relationship marketing. Research work showed the importance 
of effective and individual selling (Doney and Cannon, 1997; 
Beverland, 2001) the importance of personal interaction in the 
service industry (Armstrong and Seng, 2000) [6], [29], [26]. In 
fact there are well-developed transactional and 
transformational leadership training modules, but there are no 
such training modules in the area of relationship marketing 
especially for banking sector in India, Though Bass 
conceptualize transformational and transactional leadership can 
be employed to enhance the practice of personal selling, he 
failed to grouped his discussion on an appropriate theoretical 
base 


In this paper we have tried to address this gap, linking and 
finding out the impact of complete transactional and 
transformational approaches. 


LIMITATION, 
CONCLUSION 
The sample elected for the study suffers from many 
constraints. The selection of banks, salesperson as well as 
customers depended on their willingness to participate and 
their convenience consequently resulted into constraints. 
Present study chooses two theories transactional leadership and 
transformational leadership however, future researchers can 
consider the other dimensions and leadership aspects and 
theories like charismatic, influential theory, behavioural theory 
even go for further improvement in leadership theories and 
enhanced for effective utilisation with relationship marketing. 
It is also very important to understand how much these 
leadership theories can be implemented on salesperson. 
Another constraint is that loyal customer may have some 
perception on customer thinking which might have influenced 
in knowing the actual leadership behaviour and its impact on 
customer. The lack of recognition for the salesperson in 
implementing relationship marketing may be due to the lack of 
research on how salespeople actually build relationships 
(Beverland, 2001) [29]. Future research may be focused on 
whole leadership concept and include all the leadership 
theories, even go with experimental leadership theories, they 
can opt for more appropriate sampling techniques even cross 
national surveys can be done for knowing the leadership 
aspects and different financial institutions can be included for 
research purposes. In fact, while most transformational & 


FURTHER RESEARCH AND 
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transactional leadership models take it granted that followers 
attribute leadership qualities based on face-to-face exchanges 
with the leader, the bulk of studies in this area result in 
measuring distant as opposed to close leadership relationship. 
Specifically, this study provides evidence of transformational 
and transactional effects in a real organizational setting, where 
followers were assessing the leader they know and deal with on 
a daily basis. The evolution of innovative complex business 
strategies has increased the challenge to banks to consider the 
salesman leadership behaviour-customer relationships in a 
dynamic way, using strategies that are new and effective. In 
designing this study, our initial position was that both 
leadership styles are necessary conditions for leadership traits. 
Transformational Leadership has been linked to outcomes such 
as leadership effectiveness, imnovativeness, quality 
improvement, Transactional Leadership was also positively 
correlated with these outcomes, but, in general, the 
relationships were considerably weaker than those found for 
transformational leadership. Finally, though both transactional 
and transformation leadership styles ultimately self evident to 
enhance service performance, the impact of transformational 
leadership is likely to be greater and stronger than its 
counterpart. To maximize the satisfaction and performance 
levels of their followers, leaders must possess charisma, 
provide individualized consideration, and be intellectually 
stimulating and inspiring to followers. 
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ABSTRACT 

Nowadays different state governments and central governments 
have taken initiative to successfully implement E-Governance 
in various areas of services applying Information and 
Communication Technology (ICT) to provide better 
transparency, accuracy & security of its services to the 
citizens. In September, 2005, Parliament of India has passed 
the Mahatma Gandhi National Rural Employment Guarantee 
Act (MGNREGA), to enhance livelihood security by giving at 
least 100 days of guaranteed wage employment in a financial 
year to every house-hold in rural India. E-Governance 
solutions helps to simplify complex manual activities and 
supports transparent wage payment through agencies like Bank 
and Past-Offices. In  e-governance, information's are 
exchanged between communicating parties via Internet and 
message may be changed, modified or destroyed by hackers 
during its transmission through Internet. So, information 
hiding is needed at the time of exchanging information via 
Internet. In this paper, we propose a tool, called Public-Key 
Watermarking algorithm, for integrity verification of Job-Card 
(JC) issued to individual house-hold by state governments, so 
that the watermark is capable enough to detect any changes 
made to the Job-Card by malicious users and can also identify 
fraudulent wage payment. 


Keywords: E-Governance, Watermark-Insertion, Watermark- 
Extraction, Cryptography, Digital Watermarking, Public-key, 
Private-key, JC, ICT, MGNREGA. 


1. INTRODUCTION 

According to New Oxford English Dictionary, Government is 
the sum total of the systems by which a state or community is 
governed. The Government of India has specified e-governance 
as nothing but “using IT to bring about SMART (Simple, 
Moral, Accountable, Responsive, Transparent) governance” 
[1]. The benefits of e-governance suggest that it is convenient 
and cost-effective for businesses and government service 
deliveries. By supplying most current information in easier 
accessing way to public, the government can save energy, time 
and above all money. Another advantage of e-governance is 
greater citizen participation in government activities in 
environmental friendly way as number of paper exchange 1s 
very less compared to conventional system. Though in the 
modern times every government organizations are transforming 
their operations into electronic way, the implementation of 
e-governance also produces several risks which sometimes 
negate the advantages. One of the major risks of successful 


implementation of e-governance is security of information as 
all the important data about government and citizens and 
businesses are available online and anyone can freely access 
that information and if want can also change them easily. 

The availability of large amount of information and increased 
use of multimedia across the Internet has become an effective 
way to provide services to people around the globe. The 
growing usage of multimedia content on the Internet generates 
several serious problems like fraud, forgery, counterfeiting, 
violation of copyright and piracy [8]. With the availability of 
new generation software and hardware, anyone can easily use 
the copyrighted material without being caught. In modern days, 
the transition from analogue and paper media to digital media 
has provided several benefits but also creates problems for the 
owner as the replicas of digital media cannot be distinguished 
from the original. To provide copyright protection of digital 
content, sometimes cryptographic approach is used but it does 
not completely solve the problem. To restrict unauthorized user 
from accessing copyrighted digital information, a new 
technology referred as watermarking have been developed. We 
can divide digital watermarking into two main categories: 
visible and invisible [8]. In visible digital watermarking, the 
information is visible in the content and is equivalent to 
stamping a watermark on paper. In invisible digital 
watermarking, information is added as digital data to content, 
but it cannot be indentified visually. 

The Government of India (Gol), in September, 2005, has 
launched an ambitious project, named MGNREGA, with the 
hope to change the socio-economic [1] structure of the rural 
INDIA and all its citizens. The main purpose of MGNREGA 
[1] is to develop long-term rura! infrastructure as well as to 
enhance living standards of the rural people. Under this act, 
Gram-Panchayats play a pivotal role for planning and 
implementation of different schemes. The size and coverage of 
the scheme demands a foolproof and secure system that can 
ensure that benefits flow only to them for whom it is intended 
[1]. So, we have developed a watermarking approach via which 
we can support privacy, integrity, and authentication related 
issues of digital documents and give confidence to the user of 
the document that the transmission process is secure. 

In section-II, we have highlighted construction of Job-Card that 
may be used in ICT solution for MGNREGA scheme and 
section-I identifies basics of Public-key Watermarking 
technique. In section-IV, we have identified methods of 
incorporating watermark and its extraction techniques to 
provide authenticity of Job-Card in the light of proposed 
algorithm of the paper. 
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2. JOB CARD BASED E-GOVERNANCE 

Under MGNREGA scheme, every rural family can register at 
Gram-Panchayats by filling a registration form that is kept 
under the supervision of village-head. Every registration related 
to wage seeker family will be sent to Computer Center (CC) at 
block level for higher-level processing. 

For every valid application, CC will assign an ID comprising a 
15 digit unique registration number. This registration number 
contains two parts — 1! digit code containing district, assembly, 
block, Panchayats, and village information and 4 digit index 
number for individual family. 

After generating the unique registration ID for every wage 
seeker family, the CC will create a Job-Card and affix a 
scanned image of the job seeker family in the designated space 
within the Job-Card and will handed it over to the Gram- 
Panchayats for delivery of the same to the corresponding wage 
seeker family. The wage seekers can directly draw cash from 
paying agencies as per the wage list, by showing Job-~Card and 
providing a thumb impression. 

Here comes the necessity of public-key watermarking [3] 
technique, which not only restrict the fraudulent wage payment 
but also guarantees that paying agencies are not able to develop 
their own wage list. 

The whole application can easily be fitted into Government-to- 
Citizen (G2C) model, where government portion of the 
application is responsible for the creation, distribution and 
processing of the Job-Card and Citizen portion is only 
responsible for providing necessary information. 


Creation, 


Distribution, 


Figure 1: A G2C Model 


The parameters included in the Job-Card are shown in the 
figure 2. 

The registration process in the MGNREGA scheme is 
described with the help of the figure 3. 


3. A G2C MODEL USING PUBLIC-KEY WATERMARKING 
Digital watermarking [5] is used to insert a digital signature 
into the content so that the signature can be extracted for the 
purposes of ownership verification and/or authentication. 
Digital watermarking is a way to protect ownership property 
from illegal usage. A watermark always resides permanently 
within the host information. The watermark is hidden in the 
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host data in such a way that no one can separate it from the 
original work but the work is still accessible. 


JOB CARD REGISTRATION NUMBER 
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DISTRICT (2 DIGIT) 
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Figuré2: Schematic representation of Job-card 
mentioning different parameters 
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PANCHAYAT COMPUTER 
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Figure 3: Registration Process in MGNREGA 
Scheme 
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It inserts the hidden information into the content, also called 
the cover-media [6]. This hidden information is called the 
watermark. After inserting the watermark via specific 
algorithms, the original media will be slightly modified and is 
referred as watermarked media. There might be no or little 
perceptible differences between the original media content and 
the watermarked one. After embedding the watermark, the 
watermarked media are sent over the transmission channel to 
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the receiver, where they are checked for authenticity of the 
content owner. A technique called watermark extraction [10] is 
performed here for verification of the ownership. The 
watermark information is fully depends on the application type. 
The generalized algorithm of watermarking is described below. 
Step 1: Initially, select a cover media. 
Step 2: Insert hidden information (watermark) into the 

content (cover media). 
Step 3: After watermark embedding, the watermarked media 

is sent to the receiver via transmission channel. 
Step 4: Watermark extraction approach is applied at the 

receiver end to identify the authenticity of the 

owner. 
Watermark embedding and detection can sometimes be 
considered analogous to encryption and decryption in 
cryptography. There are two types of cryptographic approaches 
that we can use in watermark applications — Secret-key 
approach and Public-key approach. 
In Secret-key watermarking, we have an embedding function 
that takes a message, an original work and outputs a 
watermarked work. Similarly, we have a detection function, 
which takes a watermarked work and outputs a message. The 
mapping between watermarked works and the messages is 
controlled by a watermark key. Watermarking algorithms based 
on a Secret-key present a major drawback; they do not allow a 
public recovery of the watermark. In order to overcome this 
limitation, Public-key watermarking algorithms have been 
proposed; such systems consist of two types of keys: a public 
and a private one. Content can be watermarked using the 
private key, whereas the public key is used to verify the mark. 
We might develop a Public-key system, so that knowledge of 
either key does not allow an adversary to find out the other key. 
The public key can be widely distributed without risk of giving 
away the private key. Depending on the application, either the 
encryption key or the d ion key can be public. 
The description of Public-key allows feasible computation of 
the mapping in only one direction. To implement Public-key 
watermarking, the watermark embedding use one watermark 
key and the watermark detector use a different watermark key. 
The assumption is that knowledge of the detection key is not 
sufficient to allow an adversary to remove a watermark. 


4. PROPOSED ALGORITHM 

In our approach, we use two types of keys, one is Public-key 
(E) for Watermark-Insertion within the information present in 
Job Card (JC) and the other is Private-key (D) for Watermark- 
Extraction from the watermarked message. Encryption and 
Decryption method both use 
MODULAR EXPONENTIATION [3] technique and the 
modulus n, a very large number (256 bits) is created during the 
key generation process by using conventional RSA algorithm 
[3]. 

a) Key Generation Algorithm: Using this algorithm, we 
generate two types of keys that are used in the watermarking 
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application. This process actually executed in the Government 
side (Computer Center) of the G2C model. 


1. Using conventional RSA algorithm, we find e, n and d, 
where e is public exponent, d is private exponent, and we 
choose, the modulus n, a very large number (256 bits) for 
the purpose of better security. 

2. Then we apply ITERATED HASH FUNCTION 
[3] on the original information present in Job-Card using 
the generated e and d and generate corresponding Public- 
key (E) and Private-key (D). 


b) Watermark Insertion Algorithm: This algorithm is 
developed to insert generated watermark in the original 
information present in the Job-Card to provide security and 
authenticity. This process is also executed in the Government 
side (Computer Center) of the G2C model. 
1. Let, My denotes the k™ block of data [7] within 
the message I. 
2. Let, H (.) be a Cryptographic Hash Function 
such as MD5 [3]. We compute H (Mp E) = (m, my, ..., 
m,)*, where s is the size of MD (message Digest) [In our 
algorithm, we have chosen the minimum length of s is 256 
bits]. 
3. Finally, we encrypt the generated result of 
individual blocks with encryption function E (.) using the 
public key E to produce the corresponding watermarked 
block M,'. 


Here, in watermark insertion process [9, 12], the input to the 
scheme is the watermark (unique Job-Card registration 
number), the cover-media (Job-Card) and a Public-key 
(generated using our proposed algorithm). The Public-key is 
used to enforce security, which is the prevention of 
unauthorized parties from recovermg and manipulating the 
watermark. The output of the watermarking scheme is the 
watermarked Job-Card, which will be eeu to the wage 
seeker families via Panchayats. 

Here, we achieved a better security by Soi multiple keys 
to the information of the Job-Card. Before watermarking, we 
encrypt the information using some encryption keys and then 
apply watermark information on that encrypted information. In 
this way, use of two different keys allow us to provide better 
security as no one can interpret the actual hidden information 
until and unless they possess both the keys. 






Watermark (W) 
m WATERMARK | Watermarked 
yer 
D EMBEDDER 


Public Key (E) 
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Figure 4: Digital Watermark Insertion Scheme 


c) Watermark Extraction Algorithm: This algorithm is 
developed to extract generated watermark from the 
watermarked information in the Job-Card to identify 
authenticity of the message. This process is executed in the 
client side (Bank, Post-Offices) of the G2C model. 


1. We split the watermarked message I” into s number of 
blocks. 

2. Apply a decryption function D (.) on individual blocks M,' 

- using the private-key D to produce the corresponding 
block M, of watermarked message I”. 

3. Finally, we apply the same Hash function, which is used to 
encrypt the message, on M, to produce the final message 
using private-key D and generate authenticity information 
about the message as final outcome. 

4. Depending on the generated authenticity information, the 
Job-Card will etther be accepted or rejected. 

In Watermark extraction process [2, 4, 11], inputs to the 
scheme are the watermarked data, Private-key (generated by 
the same proposed algorithm that are used to develop public 
key), and the original watermark. The output of the scheme 
gives us some kind of confidence measure indicating whether 
the test data is authentic or not. 

Here, we achieved better security by using multiple keys at 

different levels of application. Initially, watermark is extracted 

from the information of the Job-Card using watermark key and 
successful extraction of which actually guarantees the 
authenticity of the owner. Then we apply decryption key to the 
authenticated information to produce the actual information 
from the encrypted information. In this way, two tier 
applications of keys provide a better hiding of valuable 
information from the intruders and only knowing of both the 
keys actual help to extract the information from the Job-Card. 

Thus the security measurement is very high if we use the above 

mentioned approach. 


Watermark (W) 





Test Data (I") 


Private Key (D) 


Figure 5: Digital Watermark Extraction Scheme 


CONCLUSION 

Here, we have usedm public key for watermark insertion and a 
private key for watermark detection. So, any person can 
perform the authenticity check by simply using a private key 
and a watermark detector device. Though the approach 
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specifies a direct relationship between Watermarking and 
Cryptography, there are some fimdamental differences exists 
between them. In Public-key Watermarking, the mapping 
between Job-Card and information’s within the Job-Card are 
many-to-one, so that given information may be embedded in 
any given Job-Card. On the other hand, in Public-key 
Cryptography, the mapping between cipher-text and plain-text 
is always one-to-one. 

Our approach combines the advantages of both Watermarking 
and Cryptography and produce a robust system to keep secure 
information hidden from the intruders. At the time of Job-Card 
production, we apply both cryptographic and watermarking 
approach and at the time of extracting information from that 
Job-Card, we again apply both cryptographic and watermarking 
approach, but the order of applying them is reverse in this case. 
The proposed approach is robust enough to protect against any 
kind of malicious attack performed by intruders. Any changes 
made to either the Job-Card or to its information, can easily be 
detected and thus the purpose of security is maintained. 


FUTURE SCOPE 

The Public-key algorithm stated here requires much more 
computation than Secret-key algorithm. It is impractical to 
encrypt and decrypt large messages using the above method. 
So, it is common to use a Secret-key algorithm for transmission 
of large amounts of data and to transmit its key, we need to use 
a Public-key algorithm. As the space in all the contents where 
watermarking techniques can be useful is very limited, it is 
impossible for us to use many kinds of watermarking 
techniques for some applications ambitiously. Our proposal is 
that, prior to application of watermark information compress 
the whole data using any compression algorithm. In this way 
we can reduce both the computation time as well as storage 
requirement. 
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ABSTRACT 

Today, many people carry mumerous portable devices, such as 
laptops, mobile phones, PDAs and mp3 players, for use in their 
professional and private lives. For the most part, these devices 
are used separately-that is, their applications do not interact. 
Imagine, however, if they could interact directly: participants 
at a meeting could share documents or presentations; all 
communication could automatically be routed through the 
wireless corporate campus network. These examples of 
spontaneous, ad hoc wireless communication between devices 
might be loosely defined as a scheme, often referred to as ad 
hoc networking, which allows devices to establish 
communication, anytime and anywhere without the aid of a 
central infrastructure. 

This paper describes the concept of mobile ad hoc networking 
(MANET) and points out some of its applications that can be 
envisioned for future. Also, the paper presents two of the 
technical challenges MANET poses, which include Geocasting 


and QoS. 


Keywords: Ad hoc networking, MANET, MIPMANET, 
Personal Area Network (PAN), Bluetooth technology, QoS, 


Geocasting. 


1. INTRODUCTION 

A Mobile Ad-hoc NETwork (MANET), also known as Mobile 
Packet Radio Networking, is a collection of wireless mobile 
nodes dynamically forming a temporary network without the 
use of any existing infrastructure or centralized administration. 
Since the nodes in a network of this kind can serve as routers 
and hosts, they can forward packets on behalf of other nodes 
and run user applications. 

MANETs are networks in which mobile routers are connected 
via wireless links formmg dynamic topologies. An important 
function of network management in a MANET is to observe 
network conditions: at the node level, this may mean keeping 
track of the traffic load; at the network level, the system must 
monitor active routes and changes in the network topology. [1] 
MANETs have their own advantages such as high robustness 
and ease to set up despite the resource constraints like limited 
bandwidth and power. Typical applications of MANETs are in 
tactical networking and disaster recovery operations. Recently, 
the rising popularity of multimedia applications among end 
users in various networks and the potential usage of MANETs 
in civilian life have led to research interest in providing QoS 
support in MANETs. It is a huge challenge to provide QoS in 
MANETs. A network's ability to provide a specified quality of 
service between a set of endpoints depends upon the inherent 


properties such as delay, throughput, loss rate, error rates of 
Imks and nodes, etc. 

Many mobile phones and other electronic devices already are 
or will soon be Bluetooth-enabled. Consequently, the ground 
for building more complex ad hoc networks is being laid. In 
terms of market acceptance, the realization of a critical mass is 
certainly positive. But perhaps even more positive- as relates to 
the end-user- is that consumers of Bluetooth-enabled devices 
obtain a lot of as-yet unraveled ad hoc functionality at virtually 
no cost. 

The purpose of this paper is to propose technological 
requirements for the successful working of MANETs. The 
main features of the proposal are (1) to show efficient routing 
algorithms as a necessity to develop MANETs and (2) to 
provide excellent quality of service (QoS). 

The remainder of the paper is structured as follows in the form 
of five sections. First Section reviews the background of ad hoc 
networking. Second Section describes the MANET technology. 
Third Section elaborates on Mobile IP for mobile ad hoc 
networks (MIPMANET). Fourth Section focuses on some of 
the significant applications of MANETs. Fifth Section provides 
the technological requirements for the development of an ad 
hoc network and proposes the importance of QoS and 
Geocasting in MANETs. 


2. HISTORY OF AD-HOC NETWORKING 

The roots of ad hoc networking can be traced back as far as 
1968, when work on the ALOHA network was initiated (the 
objective of this network was to connect educational facilities 
in Hawaii). Although fixed stations were employed; the 
ALOHA protocol lent itself to distributed channel-access 
management and hence provided a basis for the subsequent 
development of distributed channel-access schemes that were 
suitable for ad hoc networking. The ALOHA protocol itself 
was a single-hop protocol, that is, it did not inherently support 
routing. Instead every node had to be within reach of all other 
participating nodes. 

Mobile Ad Hoc Network is a name currently being given to a 
technology under development for the past 20 or so years, 
principally through research funding sponsored by the U.S. 
Government. Its initial sponsors included the Defense 
Advanced Research Projects Agency (DARPA), the U.S. Army 
and the Office of Naval Research (ONR). [8] 

Inspired by the ALOHA network and the early development of 
fixed network packet switching, Defense Advanced Research 
Projects Agency (DARPA) began work, in 1973, on the PRnet 
(packet radio network}a multihop network. In this context, 
multihopping means that nodes cooperated to relay traffic on 
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behalf of one another to reach distant stations that would 
otherwise have been out of range. PRnet provided mechanisms 
for managing operation centrally as well as on a distributed 
basis. As an additional benefit, it was realized that 
multihopping techniques increased network capacity, since the 
spatial domain could be reused for concurrent but physically 
separate multihop sessions. Although many experimental 
packet radio networks were later developed, these wireless 
systems did not ever really take off 
_ in the consumer segment. When developing IEEE 802.1 1-a 
standard for wireless local area networks (WLAN) the 
Institute of Electrical and Electronic Engineering (IEEE) 
replaced the term packet-radio network with ad hoc network. 
Packet-radio networks had come to be associated with the 
multihop networks of large-scale military or rescue operations 
and by adopting a new name, the [EEE hoped to indicate an 
entirely new deployment scenario. The ad hoc devices can also 
relay traffic between devices that are out of range. 
Mobile ad hoc wireless networks differ fundamentally both in 
functionality and capability from their static wireline network 
counterparts due to a variety of reasons, including random node 
mobility, unpredictable network dynamics, fluctuating link 
quality, limited processing capabilities, power constraints, etc. 
All of these characteristics give rise to a need for dynamic 
changes both in the functioning and management of the 
underlying network. 


3. MANET-THE TECHNOLOGY 

A Mobile Ad hoc NETwork (MANET) consists of mobile 
platforms (each platform logically consisting of a router, 
possibly with multiple hosts and wireless communications 
devices), herein simply referred to as "nodes"—which are free 
to move about arbitrarily. A MANET is an autonomous system 
of mobile nodes. The nodes may consist of separate, networked 
devices, or may be integrated into a single device such as a 
laptop computer. The nodes may be located in or on airplanes, 
ships, trucks, cars, perhaps even on people, and there may be 
multiple hosts per router. The nodes are equipped with wireless 
transmitters and receivers using antennas which may be 
omnidirectional (broadcast) highly-directional (point-to-point) 
or some combination thereof. At a given point in time, 
depending on the nodes' positions and their transmitter and 
receiver coverage patterns, transmission power levels and co- 
channel interference levels, a wireless connectivity in the form 
of a random, multihop graph or "ad hoc” network exists 
between the nodes. This is in contrast with the topology of the 
existing Internet, where the router topology is essentially static 
(barring network reconfiguration or router failures). In a 
MANET, the routers are mobile and inter-router connectivity 
may change frequently during normal operation. Unlike 
conventional wireless networks, ad hoc networks have no fixed 
network infrastructure or administrative support. The topology 
of the network changes dynamically as mobile nodes join or 
depart the network or radio links between nodes become 
unusable. [4] A MANET may operate either in isolation, or 
may be connected to the greater Internet via gateway routers. 
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MANETs have several salient characteristics: 

> Dynamic topologies: Nodes are free to move arbitrarily; 
thus, the network topology—which is typically multihop— 
may change randomly and rapidly at unpredictable times. 
Adjustment of transmission and reception parameters such 
as power may also impact the topology. 

èe Bandwidth-constrained, variable capacity links: 
Wireless links will continue to have significantly lower 
capacity than their hardwired counterparts. One effect of 
the relatively low to moderate link capacities is that 
congestion is typically the norm rather than the exception, 
i.e. aggregate application demand will likely approach or 
exceed network capacity frequently. 

e Power-constrained operation: Some or all of the nodes in 
a MANET may rely on batteries for their energy. For these 
nodes, the most important system design criteria for 
optimization may be that of power conservation. 

e Limited physical security: Mobile wireless networks are 
generally more prone to physical security threats than are 
fixed, hardwired nets. Existing link security techniques are 
often applied within wireless networks to reduce security 
threats. 


3.1 MIPMANET 

Mobile IP for mobile ad hoc networks (MIPMANET) is 

designed to give nodes in ad hoc networks 

e Access to the Internet; and 

e The services of mobile IP. 

The solution uses mobile IP foreign agents as access points to 

the Internet to keep track of the ad hoc network in which any 

given node is located and to direct packets to the edge of that 
ad hoc network. 

The ad hoc routing protocol is used to deliver packets between 

the foreign agent and the visiting node. A layered approach that 

employs tunneling is applied to the outward data flow, to 
separate the mobile IP functionality from the ad hoc routing 
protocol. This makes it possible for MIPMANET to provide 

Internet access by enabling nodes to select multiple access 

points and to perform seamless switching between them. In 

short, MIPMANET works as follows: 

e Nodes in an ad hoc network that want Internet access use 
their home IP addresses for all communication, and 
register with a foreign agent. 

e To send a packet to a host on the Internet, the node in the 
ad hoc network tunnels the packet to the foreign agent. 

e To receive packets from hosts on the Internet, packets are 
routed to the foreign agent by ordinary mobile IP 
mechanisms. The foreign agent then delivers the packets to 
the node m the ad hoc network. 

e Nodes that do not require Internet access interact with the 
ad hoc network as though it were a stand-alone network 
that is, they do not require data regarding routes to 
destinations outside the ad hoc network. 

e Ifa node cannot determine from the IP address whether or 
not the destination is located within the ad hoc network, it 


50 


Study of Impact of Mobile Ad — Hoc Networking and its Future Applications 


will first search for the visiting node within the ad hoc 

network before tunneling the packet. 
In MIPMANET, only registered visiting nodes are given 
Internet access, thus the only traffic that will enter the ad hoc 
network from the Internet is traffic that is tunneled to the 
foreign agent from a registered nodes home agent. Likewise, 
traffic that leaves the ad hoc network is tunneled to the forsign 
agent from a registered node. This results in a separation 
between, and thereby the capacity to control, traffic that is local 
in the ad hoc network and traffic that enters the ad hoc 
network. 


4, APPLICATIONS OF AD-HOC NETWORKING 
Characterized by their flexibility to be deployed and functional 
in on-demand situations, combined with their capability to 
transport a wide spectrum of applications, mobile ad hoc 
networks (MANETs) are gaining rapid momentum both in the 
commercial and military arenas. To turn mobile ad hoc 
networks into a commodity, we should move to more 
pragmatic “opportunistic ad hoc networking” in which 
multihop ad hoc networks are not isolated self-configured 
networks, but rather emerge as a flexible and low-cost 
extension of wired infrastructure networks coexisting with 
them. [7] 


4.1 MILITARY SECTOR 

The ad hoc packet-radio networks have mainly been considered 
for military applications, where a decentralized network 
configuration is an operative advantage or even a necessity. In 
the military sector, MANETs are becoming the basis for the 
future network-centric warfare (NCW) paradigm as 
exemplified by the Future Combat Systems (FCS) and 
Warfighter Information Network-Tactical (WIN-T) programs. 
The success of MANETs is however critically tied to their 
capability of transporting a wide spectrum of applications with 
varying quality of service (QoS) requirements or service level 
agreements (SLAs), and providing continued/un-interrupted 
service (i.e. seamless recovery) despite failures in the 
underlying network. 

Today, MANETs enable war fighters to benefit from a 
sophisticated Internet protocol (IP)-based communications 
network that can be set up even in difficult terrain and in 
remote war zones. Furthermore, tactical network applications 
of MANETs also include realization of automated battlefields, 
wherein autonomous robots and autonomous ground vehicles 
are used to explore hostile battlegrounds and check for land 
mines. These significant strides have made ad hoc networking a 
very valuable option in modem tactical military 
communication networks and the industry is facing significant 
demand for MANET solutions from defense establishments 
worldwide. 


4.2 COMMERCIAL SECTOR-PAN 

Short-range ad hoc networks can simplify intercommunication 
between various mobile devices (such as a cellular phone and a 
PDA) by forming a PAN, and thereby eliminate the tedious 
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need for cables. This could also extend the mobility provided 
by the fixed network (that is, mobile IP) to nodes further out in 
an ad hoc network domain. The Bluetooth system is perhaps 
the most promising technology in the context of personal area 
networking (PAN). 
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Figurel. PAN scenario with four interconnected PANS, two of 
which have an Internet connection via a Bluetooth LAN access 
point and a GPRS/UMTS phone. 


A PAN can also encompass several different access 
technologies distributed among its member devices which 
exploit the ad hoc functionality in the PAN. For instance, a 
notebook computer could have a wireless LAN (WLAN) 
interface (such as IEEE 802.11) that provides network access 
when the computer is used indoors. Thus, the PAN would 
benefit from the total aggregate of all access technologies 
residing in the PAN devices. 

Figure | shows a scenario in which four Bluetooth PANS are 
used. The PANs are interconnected via laptop computers with 
Bluetooth links. In addition, two of the PANs are connected to 
an IP backbone network, one via a LAN access point and the 
other via a single GPRS/UMTS phone. 

In traditional 802.11 networks, clients dictate timing of 
communication, and APs do not coordinate with one another 
(client-controlled communication). Instead, the clients choose 
when to connect and which AP to connect with, and APs 
choose when to respond to each client. With an infrastructure- 
controlled approach, the WLAN can decide which AP or client 
transmits when and can guarantee packet delivery while 
dynamically reserving bandwidth over the air for VoIP 
communication. 


4.3 COMMERCIAL SECTOR-BLUETOOTH NETWORKING 
Worldwide, the industry has shown a tremendous interest in 
techniques that provide short-range wireless connectivity. In 
this context, Bluetooth technology is seen as the key 
component.[5] However, Bluetooth technology must be able to 
operate in ad hoc networks that can be stand-alone, or part of 
the [P-networked world, or a combination of the two. 
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Bluetooth devices can interact with one or more other devices 
in several different ways. The simplest scheme is when only 
two devices are involved, one of the devices acts as the master 
and the other as a slave. This ad-hoc network is called a 
piconet. A piconet can consist maximum of eight devices. The 
interconnection of piconets is called scatternet. The main 
purpose of Bluetooth is to replace cables between electronic 
devices, such as telephones, PDAs, laptop computers, digital 
cameras, printers, and fax machines, by using a low-cost radio 
chip. Short-range connectivity also fits nicely into the widearea 
context, in that it can extend IP networking into the personal- 
area network domain, as discussed earlier. Bluetooth must be 
able to carry IP efficiently in a PAN, since PANs will be 
connected to the Internet via UMTS or corporate LANs, and 
will contain [P-enabled hosts.[5] Generally speaking, a good 
capacity for carrying IP would give Bluetooth networks a wider 
and more open interface, which would most certainly boost the 
development of new applications for Bluetooth. In February 
1998, the Bluetooth Special Interest Group (SIG) was founded 
to promote, develop and define the Bluetooth specification. The 
Bluetooth SIG aims at delivermg a universal solution for 
connectivity among the heterogeneous devices. This is one of 
the first commercial realizations of ad-hoc wireless networking. 


5, TECHNICAL CHALLENGES IMPOSED BY AD-HOC 
NETWORKING 

This section outlines the technical requirements for mobile ad- 
hoc networks to achieve their potential because ad hoc wireless 
networks are self-creating, self-organizing, and self- 


administering. It outlines the need for QoS. 


5.1 GEOCASTING 

Geocasting is a variant of the conventional multicasting 
problem. For multicasting, conventional protocols define a 
multicast group as a collection of hosts which register to a 
multicast group address. However, for geocasting, the group 
consists of the set of all nodes within a specified geographical 
region. Hosts within the specified region at a given time form 
the geocast group at that time. [9] 

When an application must send the same information to more 
than one destination, multicasting is often used, because it is 
much more advantageous than multiple unicasts in terms of the 
communication costs. Cost considerations are all the more 
important for a mobile ad hoc network (MANET) consisting of 
mobile hosts that communicate with each other over wireless 
links, in. the absence of a fixed infrastructure. In MANET 
environments, the multicast problem is more complex because 
topology change of the network is extremely dynamic and 
relatively unpredictable. To do multicasting, some way is 
needed to define multicast groups. In conventional multicasting 
algorithms, a multicast group is considered as a collection of 
hosts which register to that group. It means that, if a host wants 
to receive a multicast message, it has to join a particular group 
first. In order to send a message to the multicast group, a host 
just needs to multicast the message to the address of that group. 
All the group members then receive the message. [6] 
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In Geocasting, the message (geocast message) is delivered to 
the set of nodes within a specified geographical area. Unlike 
the traditional multicast schemes, here, the multicast group (or 
geocast group) is implicitly defined as the set of nodes within a 
specified area. [9] 

This section briefly explained the problem of geocasting — 
broadcasting to every node in a specified geographical area — in 
mobile ad hoc environments. 

The basic routing philosophy on the Internet is “best-effort”; 
there are several requirements for it that are explored in the 
next section. 


5.2 ENSURING QOS 

This section addresses some of the quality of service issues for 
ad hoc networks which have recently started to receive 
increasing attention in the literature. The focus is on QoS 
routing. This is a complex and difficult issue because of the 
dynamic nature of the network topology and generally 
imprecise network state information. [2] 

Quality of Service (QoS) refers to the ability of a network to 
provide better, more predictable service to selected network 
traffic over various underlying technologies, including IP- 
routed networks. QoS features are implemented in network 
routers by supporting dedicated bandwidth, improving loss 
characteristics, avoiding and managing network congestion, 
shaping network traffic, and setting traffic priorities across the 
network. 

The notion of QoS is a guarantee by the network to satisfy a set 
of predetermined service performance constraints for the user 
in terms of the end-to-end delay statistics, available bandwidth, 
probability of packet loss, and so on. QoS guarantees can be 
attamed only with appropriate resource reservation techniques. 
The most important element among them is QoS routing, that 
is, the process of choosing the routes to be used by the flow of 
packets of a logical connection in attaining the associated QoS 
guarantee. The cost of transport and total network throughput 
may be included as parameters. Obviously, enough network 
resources must be available during the service invocation to 
honor the guarantee. The first essential task is to find a suitable 
path through the network, or route, between the source and 
destination{s) that will have the necessary resources available 
to meet the QoS constraints for the desired service. The task of 
resource (request, identification, and) reservation is the other 
indispensable ingredient of QoS. By QoS routing, we mean 
both these tasks together. QoS routing offers serious challenges 
even for today's Internet. Different service types (e.g., voice, 
live video, and document transfer) have significantly different 
objectives for delay, bandwidth, and packet loss. [3] 

Three distinct route-finding techniques are used 
determining an optimal path satisfying the QoS constraints. 
These are source routing, destination routing, and hierarchical 
routing. In source routing, a feasible path is locally computed 
at the source node using the locally stored global state 
information, and then all other nodes along this feasible path 
are notified by the source of their adjacent preceding and 
successor nodes. In distributed or hop-by-hop routing, the 
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ource as well as other nodes is involved in path computation 
y identifying the adjacent router to which the source must 
orward the packet associated with the flow. Hierarchical 
outing, as the name suggests, uses the aggregated partial 
‘lobal state information to determine a feasible path using 
ource routing where the intermediate nodes are actually 
ogical nodes representing a cluster. Flooding is not an option 
or QoS routing, except for broadcasting control packets under 
ppropriate circumstances (e.g., for beaconing, or at the start of 
route discovery process). 

his section briefly described the new but rapidly growing area 
f research on guaranteeing QoS in ad hoc mobile wireless 
etworks, 


+ WHERE DO WE GO FROM HERE: POSSIBLE 
SOLUTION TO THE CHALLENGES 

che huge way in which research activity has been going on, in 
woth academia and industry, on wireless mobile ad hoc 
<etworks, is a representation of their tremendous potential now 
„eing well recognized. More and more results are appearing on 
oblems related to basic network limitations, new protocols 
ind their performance evaluations, network architecture and 
lesign, new technologies, and so on. 

‘or increased network reliability and enhanced QoS, it is 
2quired to develop and implement efficient routing algorithms 
nd protocols. The significant thing is that such an algorithm 
ust be evolved which dynamically calculates the route to 


orward and transfer data reliably, within the ad-hoc network or | 


2 a node that wishes to communicate with the wider internet. 
‘uch an algorithm will achieve multicast efficiency by tracking 
be availability of resources for each node within its 
«cighborhood. Computation of free bandwidth will be based on 
servations made for ongoing sessions and the requirements 
<ported by the neighbors. The algorithm will proactively 
hoose the next node on the route and generate table 
ontaneously. 

an ad-hoc network is highly dynamic, and transmissions are 
usceptible to fades, interference, and collisions from 
aidden/exposed stations, therefore, the algorithm will provide 
-outes that can most probably satisfy the bandwidth 
equirement of a route, as long as the route is established. 

\lso, the algorithm will dynamically re-establish routes for 
going connections upon link failures and topology changes 
the ad hoc network. This will make it easy to perform 
‘fficient resource utilization or to execute critical applications. 
Moreover, the algorithm must be optimized in order to 
minimize its computation complexity and hence achieve better 
esults within the hardware constraints such as power 
(he seamless integration of mobile ad hoc networks with other 
wireless networks and fixed infrastructures will be an essential 
art of the evolution towards future fourth generation 
sommunication networks. From a technological point of view, 
he realization of this vision still requires a large number of 
shallenges to be solved related to devices, protocols, 
xpplications and services. 
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CONCLUSION 

The objectives of this paper have been to examine the history 
of the ad hoc networking and various applications of MANETs 
esp. in commercial sector; to suggest the significance of the 
role of Geocasting in MANETs and to propose QoS as the new 
but rapidly growing area of research on guaranteeing QoS in ad 
hoc mobile wireless networks, and as a technical challenge and 
a necessary requirement to the growth of ad hoc networks. 
MANETs have evolved a great deal over the two decades since 
its inception. Although the technology was confined to the 
military arena up until now, it is currently gaining traction in 
the commercial domain of late. The technology at present 
demands renewed attention owing to recent developments in 
radio communications and advancements in wireless 
networking. The proliferation of unmanned aerial systems 
(UAS) over the last decade is one of the most significant 
drivers for the increased deployment of MANETs in the 
battlefield. 


FUTURE SCOPE 

Guaranteeing QoS in such a network may be impossible if the 
nodes are too mobile. The challenges increase even more for 
those ad hoc networks that, like their conventional wireless 
counterparts, support both best effort services and those with 
QoS guarantees, allow different classes of service, and are 
required to imterwork with other wireless and wireline 
networks, both connection-oriented and connectionless. 
Algorithms, policies, and protocols for coordinated admission 
control, resource reservation, and routing for QoS under such 
models are only beginning to receive attention. QoS for ad hoc 
networks is a new area of research. 

Much work remains to be done on cost-effective 
implementation issues to bring the promise of ad hoc networks 
within the reach of the public. 
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saBSTRACT 
oftware reliability growth models (SRGM) are used to assess 
¢odular software quantitatively and predict the reliability of 
ach of the modules during module testing phase. In the last 
w decades various SRGM’s have been proposed in literature. 
dowever, it is difficult to select the best model from a plethora 
f models available. To reduce this difficulty, unified modeling 
pproaches have been proposed by many researchers In this 
aper we present a generalized framework for software 
eliability growth modeling with respect to testing effort 
xpenditure and incorporate the faults of different severity. We 
ave used different standard probability distribution functions 
x representing failure observation and fault detection/ 
orrection times. The faults in the software are labeled as 
dmple, hard and complex faults. Developing reliable modular 
oftware is necessary. But, at the same time the testing effort 
vallable during the testing time is limited. Consequently, it is 
nportant for the project manager to allocate these limited 
esources among the modules optimally during the testing 
rocess. In this paper we have formulated an optimization 
roblem in which the total number of faults removed from 
todular software is (which include simple, hard and complex 
quits) maximized subject to budgetary and reliability 
onstraints. To solve the optimization problem we have used 
‘enetic algorithm. One numerical example has been discussed 
? illustrate the solution of the formulated optimal effort 
llocation problem. 


«eywords: Non-homogenous Poisson process, software 
éliability growth model, Probability Distribution Functions, 
‘ault Severity, Genetic Algorithm. 


. INTRODUCTION 

lowadays large and complex software systems are developed 
y integrating a number of small and independent modules. 
“Aodules can be visualized as independent softwares 
erforming predefined tasks, mostly developed by separate 
eams of programmers and sometimes at different geographical 
ocations. During the development of modular software, faults 
‘an crop in the modules due to human imperfection. These 
aults manifest themselves in terms of failures when the 
aodules are tested independently during the module testing 
«hase of software development life cycle. However, in today’s 
omputer invaded world these failures can lead to big losses in 
erms of money, time and life. Thus it is very important to 
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evaluate software reliability of each module during modular 
testing phase. 

To assess modular software quantitatively and predict the 
reliability of each of the modules during module testing, 
software reliability growth models (SRGM) are used. 
Numerous SRGM’s, which relate the number of failures (fault 
identified) and the Execution time (CPU time/Calendar time) 
have been discussed in the literature [19,5,3]. All these SRGMs 
assume that the faults in the software are of the same type. 
However, this assumption is not truly representative of reality. 
The software includes different types of faults, and each fault 
requires different strategies and different amounts of testing 
effort for removal. Ohba [8] refined the Goel-Okumoto[1] 
model by assuming that the fault detection/removal rate 
increases with time and that there are two types of faults in the 
software. SRGM proposed by Bittanti et al. [22] and Kapur and 
Garg [13] has similar forms as that of Ohba [8] but they 
developed under different set of assumptions. These models 
can describe both exponential and S-shaped growth curves and 
therefore are termed as flexible models [22, 8, 13]. Kapur et al. 
[16] developed Flexible software reliability growth mode! with 
testing effort dependent learning process in which two types of 
software faults were taken. Further, they proposed an SRGM 
with three types of faults [19]. The first type of fault was 
modeled by an Exponential model of Goe] and Okumoto [1]. 
The second type was modeled by Delayed S-shaped model of 
Yamada et al. [21]. The third type was modeled by a three- 
stage Erlang model proposed by Kapur et al. [19]. The total 
removal phenomenon was modeled by the superposition of the 


three SRGMs. Shatnawi and Kapur [11] later si a te 
generalized model based on classification of the the 
software system according to their removal compl 

The above literature review reveals that in the last EA decades 
several SRGM’s have been proposed. This plethora of SRGM’s 
makes the model selection a tedious task. To reduce this 
difficulty, unified modeling approaches have been proposed by 
many researchers. The work in this area started as early as in 
1980s with Shantikumar [4] proposing a Generalized birth 
process model. Gokhale and Trivedi [23] used Testing 
coverage function to present a unified framework and showed 
how NHPP based models can be represented by probability 
distribution functions of fault —detection times. Another 
unification methodology is based on a systematic study of Fault 
detection process (FDP) and Fault correction process (FCP) 
where FCPs are described by detection process with time delay. 
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The idea of modeling FCP as a separate. process following the 
FDP was first used by Schneidewind [10]. More general 
treatment of this concept is due to Xie et al [9] who suggested 
modeling of Fault detection process as a NHPP based SRGM 
followed by Fault correction process as a delayed detection 
process with random time lag. The unification scheme due to 
Kapur et al [17] is based on Cumulative Distribution Function 
for the detection/correction times and incorporates the concept 
of change point in Fault detection rate. These schemes have 
proved to be fruitful in obtaining several existing SRGM by 
following single methodology and thus present a perceptive 
investigation for the study of general models without making 
many assumptions. In this paper we made use of such unified 
scheme for presenting a generalized framework for software 
reliability growth modeling with respect to testing effort 
expenditure and incorporate the faults of different severity. We 
have used different standard probability distribution functions 
for representing failure observation and fault correction times 
Also , the total number of faults in the software are labeled as 
simple, hard and complex faults .It is assumed that the testing 
phase consists of three different processes, namely failure 
observation, fault isolation and fault removal. The time delay 
between the failure observation and subsequent removal is 
assumed to represent the seyerity of the fault, __ 

Developing reliable modular software is necessary. But, at the 
game time the testing effort available during the testing time is 
limited. These testing efforts include resources like human 
power, CPU hours, and elapsed time, etc. Hence, to develop a 
good reliable software system, a project manager must 
determine in advance how to effectively allocate these 
resources among the various modules. Such optimization 
problems are called “Resource Allocation problems”. Many 
authors have investigated the problem of resource allocation [2, 
7]. Kapur et al [20, 15] studied various resource allocation 
problems maximizing the number of faults removed form each 
module under constraint on budget and management 
aspirations on reliability for exponential and S-shaped SRGMs 
[1,19 8] In this paper we have formulated an optimization 
problem in which the total number of faults removed from 
modular software is (which include simple, hard and complex 
faults) maximized subject to budgetary and ReHsPuny, 
constraints. 

To solve the effort allocation problem ` formulated in this 
research paper we use Genetic Algorithm(GA). GA stands up a 
powerful tool for solving search & optimization problems. The 
complex non linear formulation of the optimal effort allocation 
problem is the reason behind choosing genetic algorithm as the 
solving tool. GA always considers a population of solutions. 
There is no particular requirement on the problem before using 
GA’s, as it can be applied to solve any kind of problem. 

The paper is organized as follows. Section 2 gives the 
generalized framework for developing the software reliability 
growth model for faults of different severity. In section 3 
parameter estimation and model validation of the proposed 
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model is done through SPSS. The testing effort allocatio 
problem is formulated in section 4. In section 5 geneti 
algorithm is presented for solving the discussed problen 
Section 6 illustrates the optimization problem solution throug 
a numerical example. Finally, conclusions are drawn and ar 
given in section 7. 

2.1 NOTATIONS 

W(t): Cumulative testing effort in the interval (0.t]. 

w(t) : Current testing-effort expenditure rate at testing time 


Sw = w(t) 


m,(W,) : Expected number of faults removed of type jj=simplk 
Hard, Complex Faults). 

m(W,) : Expected number of total faults removed. — 

b : Constant fault detection rate. 


Ë  : rate of consumption of testing-effort 

A(W,): Intensity function for Fault correction process (FCP 
or Fault correction rate per unit time. 

G(W,), F(W,), H(W, ): Testing effort dependent 


Probability Distribution Function for Failure observation, Faul» 
Detection and Fault Correction Times 


g F), J), AW, ): Testing effort dependent Probabilit 
Density Function for Failure observation, Fault Detection and 
Fault Correction Times 

* : Convolution. 


®. Steiltjes convolution. 


2.2 BASIC ASSUMPTIONS 

The proposed model is based upon the following basi 

assumptions: 

l. Failure occurrence, fault detection, or "fault remove 
phenomenon follows NHPP. 

2. Software is subject to failures during execution caused b 
faults remaining in the software. 

3. The faults existing in the software are of three types: Spi 
hard and complex. They are distinguished by the amount o 
testing effort needed to remove them 

4. Fault removal process is prefect and failure observation/fauk 
isolation/ fault removal rate is constant. 

5. Each time a failure occurs, an immediate effort takes place ti 
decide the cause of the failure in order to remove it. The tim: 
delay between the failure observation and its subsequen 
fault removal is assumed to represent the severity of th 
faults. The more severe the fault, more the time delay. 

6. The fault isolation/removal rate with respect to testing effor 
intensity is proportional to the number of observed failures. 


2.3 MODELING TESTING EFFORT 


The proposed SRGM in this paper takes into account the time 
dependent variation in testing effort. The testing effor 
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(resources) that govern the pace of testing for almost all the 
software projects are Manpower and Computer time. 

To describe the behavior of testing effort, Exponential, 
Rayleigh, or Weibull fimction has been used. 

The testing-effort described by a Weibull-type distribution is 
given by: 


t 
W()=a:|1-exp(- fi gar | (1) 
In equation (1), if g(t}=B. 
Then, there is an exponential curve, and the cumulative testing- 


effort in (0,t] is W(t)= a -[1—exp(—B-1)]. (2) 
Similarly in(1)if g(t) = -t. 
Then, there is a Rayleigh curve and the cumulative testing- 


effort is given by: W (t) -«-{1-exp|-£.2]} (3) 


Andif g(t)=y.8 -t7 in (1), then 
W(t) =a-(1-exp[-f-1" |). (4) 


which is cumulative testing effort of Weibull curve. 


2.4 MODEL DEVELOPMENT l 

Let a @ and a; be the simple, hard and complex faults 
-respectively at the beginning of testing. Also ‘a’ is the total 
fault content Le. a" aj, 84 83. 


2.4.1 MODELING SIMPLE FAULTS 

[Simple faults are the faults which can be removed instantly as 
soon as they are observed. The mean value function for the 
simple faults of the software reliability growth model with 
wespect to testing effort expenditure can be written as [18]: 


m,(W,)= a, FW) (5) 
where, F(W,) is testing effort dependent distribution 
function. 

From Equation (5), the 
function A(W,) is given by: 
AW,)=a,F (W,) (6) 
“Or we can write 


“i F (™) (7) 


A = [9 - mW) by F(W;) 


instantaneous failure intensity 


AWW) = 


2.4.2 MODELING HARD FAULTS 

The hard faults consume more testing time for the removal. 
This means that the testing team will have to spend more time 
to analyze the cause of the failure and therefore requires greater 
time to remove them. Hence the removal process for hard fauits 
is modeled as a two-stage process and is given by[18]: 
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my (W;) = a) (F @G)(M;) , and (8) 


(raw) i 


OF OGM) 


a —m(W,) | (9) 


2.4.3 MODELING COMPLEX FAULTS 

These faults require more testing time for removal after 
isolation as compared to hard fault removal. Hence they need 
to be modeled with greater time lag between failure 
observation and removal. Thus, the removal process for 
complex faults is modeled as a three-stage process: 


m (W;)=03(F @G@H)(F,) 
And the instantaneous failure intensity function A(W, ) is: 


(/*g*h)(m) 
1-(F@G@A)(h,) 


(10) 


Am) = [ 23 -m)] (11) 


2.4.4 MODELING TOTAL FAULTS 
The total fault removal phenomenon is the superimposition of 
the simple, hard and complex faults, and is therefore given as: 


m(W,) =m, (W,) +m, (W,) +m, (WH, ) (12) 
=a,F(W,)+a,(F @G)(W,)+a,(FOEG@H)(F,) 
A particular case of the proposed model is tabulated in Table 


2.1 
W, a 
Simple exp(b, 
Wi~ | W~ 
exp(b, exp(b, 
W, i 
Complex | /(W,) | [(W,) 
N(u,0°) 


MVE of Total Fault 
mW) =a [1-6* ]+ a,|1—((1+b%,)e™) 


+a,|(0(W,,4,07)) | 


Table 2.1: A Particular Case 
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2.5 RELIABILITY EVALUATION 

Using the SRGM we can evaluate the reliability of the software 
during the progress of testing and predict the reliability at the 
release time. Reliability of software is defined as “given that 
the testing has continued up to time t, the probability that a 
software failure does not occur in time 
interval (t,t + At) (At 2 0)”. Hence the reliability of software 
is represented mathematically as 

R(t) m R(t+At| t) = exp TEAD) (13) 
Another measure of software reliability at time t is defined as 
“the ratio of the cumulative number of detected faults at time t 
to the expected number of initial fault content of the software” 
given by[4]: 


R()= (14) 


To incorporate the effect of testing effort in the reliability 
estimation of each module Equation (14) can be modified as: 


Rw) = (15) 


3. PARAMETER ESTIMATION 
VALIDATION 
To measure the performance of the proposed model we have 
carried out the parametef estimation on the data set cited in 
M.Ohba [8](DS-I). The software was tested for 19 weeks 
during which 47.65 computer hours were used and 328 faults 
were removed. The estimation results for Exponential, 
Rayleigh, and Weibull function are given in table 3.1 


Parameter Estimation for DS-I 


Cae fa ae tae 


-= 
pe 
sae [om our] ler 
[mn | 


AND MODEL 
















function 
Table 3.1: Testing Effort Function Parameter Estimates 


Weibull effort function is chosen to represent the testing effort 
as it provided the best fit on the testing effort data (based on the 
highest value of R.) Based upon these estimated parameters, 
parameters of proposed SRGM were estimated. The goodness 
of fit measures used are Mean Square Error (MSE) and 
Coefficient of multiple determination (R°). The results are 
compared with SRGM proposed by Kapur et al. [19] with 
three types of fault. The results are tabulated in table 3.2 
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(Letting b,;=b.=b;=b)The goodness of fit curves for DS-I is 
ziven in Figure: 3.1 


Proposed Model |Kapur et al. Model [19 
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Figure3.1: Goodness of Fit Curve for DS- 
4. TESTING RESOURCE ALLOCATION PROBLEM 


4.1 NOTATIONS: 

j : 1,2,3; Simple faults-1;Hard Faults-2, Complex Faults-3 
1: Module, 1,2..N 

N : Total number of modules 


m;(W,) : Mean value function for ith module 


b;  : Constant fault detection rate for j" fault type in jth 
module 
ai: Constant, representing the number of j* fault type 


lying dormant in i? module at the beginning of testing, 


ci; : Cost of removing j fault from it” module 


Wi; : Testing effort for i® module 
R; : Reliability of each module 


B : Total cost of removing different types of faults 
W : Total testing effort expenditure 


4.22 MATHEMATICAL FORMULATION 
Consider software with ‘N’ modules where each module is 
different in size, complexity, the functions they perform etc. In 
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ach module there are three types of faults; simple, hard and 
complex. The software has to be released in the market at a 
gredefined software release time with limited availability of 
testing resources expenditure. Further the cost of removing the 
fault from each module is dependent on its severity. 

Therefore, the problem of maximizing the faults of each of N 
independent modules such that reliability of each module is at 
feast Ro is formulated as: 

Maximize 


ADA 
Slosb-+™)) Hlet-cenmoe™) 
+¥(on{(0,(W,..07)) 


Subject to: 


N 
$ (Cy ,) + Copa (W, ) + Cay (Wi) S B 


l =1,2...N 
i=l 
N 
Wish i=1,2...N 
i=l 
R; > Ro i=1,2..N PH 
W; z0 L=, 2N 


$. GENETIC ALGORITHM FOR TESTING RESOURCE 
ALLOCATION 

The above optimization problem is solved by a powerful 
computerized heuristic search and optimization method, viz. 
genetic algorithm (GA) that is based on the mechanics of 
natural selection and natural genetics. In each iteration (called 
generation), three basic genetic operations i.e., A 
/reproduction, crossover and mutation are executed. 

For implementing the GA in solving the allocation problem, the 
following basic elements are to be considered. 


5.1 CHROMOSOME REPRESENTATION 

Genetic Algorithm starts with the initial population of solutions 
represented as chromosomes. A chromosome comprises genes 
where each gene represents a specific attribute of the solution. 
Here the solution of the testing-effort allocation problem in 
modular software system includes the effort resources 
consumed by individual modules. Therefore, a chromosome is 
a set of modular testing effort consumed as part of the total 
testing effort availability. 


5.2 INITIAL POPULATION 

For a given total testing time W, GA generates the initial 
population randomly. It initialize to random values within the 
limits of each variable. 
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5.3 FITNESS OF A CHROMOSOME 

The fitness is a measure of the quality of the solution it 
represents in terms of various optimization parameters of the 
solution. A fit chromosome suggests a better solution. In the 
effort allocation problem, the fitness function is the objective of 
testing effort optimization problem along with the penalties of 
the constraints that are not met. 


5.4 SELECTION 

Selection is the process of choosing two parents from the 
population for crossover. The higher the fitness function, the 
more chance an individual has to be selected. 

The selection pressure drives the GA to improve the population 
fitness over the successive generations. Selection has to be 
balanced with variation form crossover and mutation. Too 
strong selection means sub optimal highly fit individuals, will 
take over the population, reducing the diversity needed for 
change and progress; too weak selection will result in too slow 
evolution. We use “Tournament selection” here. 


5.5 CROSSOVER 

Crossover is the process of taking two parent solutions and 
producing two similar chromosomes by swapping sets of 
genes, hoping that at least one child will have genes that 
improve its fitness. In the testing resource allocation problem, 
crossover diversifies the population by swapping modules with 
distinct time consuming, particularly when the population size 
is small. 


5.6 MUTATION 

Mutation prevents the algorithm to be trapped in a local 
minimum. Mutation plays the role -of recovering the lost 
genetic materials as well as for randomly disturbing genetic 
information. 

The important parameter in the mutation technique is the 
mutation probability. The mutation probability decides how 
often parts of chromosome will be mutated. If there is no 
mutation, offspring are generated immediately after crossover 
(or directly copied) without any change. In our problem. of 
testing resource allocation, we have used a mutation probability 
of 10%. 

With the basic modules of genetic algorithm described above, 
the procedure for solving the optimal effort allocation problem 
is as follows [6]: 

Step 1: Start 

Step 2: Generate random population of chromosomes 

Step 3: Evaluate the fitness of each chromosome in the 
population 

Step 4: Create a new population by repeating following steps 
until the new population is complete: 


[Selection] Select two parent chromosomes from a population 
according to their fimess 
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|Crossover]' With a crossover probability, cross over the 
parents to form new offspring (children). If no crossover is 
performed, offspring is the exact copy of parents. 

{Mutation} With a mutation probability, mutate offspring at 
each locus (position in chromosome) 

[Accepting] Place new offspring in the new population 
[Replace] Use new generated population for further sum of the 
algorithm. 

[Test] If the end condition is satisfied, stop and return the best 
solution in the current population 

[Loop] Go to step 3 for fitness evaluation 


6. NUMERICAL EXAMPLE 

The Effort Allocation Problem described in section 4 is 
illustrated numerically in this: soction. Consider a software 
system consisting of three modules, whose parameters have 
already been estimated using software failure data. These 
parameter estimates for each module is shown in Table 6.1. 
The total testing resources available is assumed to be 5000 
units. Total cost for removing the different types of faults is 
10000 units. Also, it is desired that the reliability of each 
module is at least 0.9. 


odule| ata] a | b lalale| # |o 
2 |332| 97 | 76 p.o0234] 5 | 10 | 15 [14.987.123 
_3 |298] 64 | 32 |o.oors| 5 | 10 | 15 j12.4567.654 


Table 6.1: Parameter Estimates for effort allocation problem 









Based on the above information, the problem (P1) is solved 
PE genetic algorithm. The parameters used in GA evaluation 





Table 6.2: Parameter of the GA 


The optimal testing time allocation to each type of fault in 
module and hence total fault removed from each module and 
ssponding cost of removing is shown in table 6.3. 


Cost oe 
; Reliabili 


sills T EEE 
3622.524 
EZE 





Tke 6.3: The A testing effort one with the 
corresponding cost of each module 
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CONCLUSION 

In this paper we have discussed the problem for modular 
software at the unit testing stage. We have made use of unified 
scheme for presenting a generalized framework for Software 
reliability growth modeling with respect to testing effort 
expenditure and incorporated the faults of different severity. 
The faults in each module are of three types-simple, hard and 
complex. Further we have optimally allocated the testing effort 
to each type of fault and the modules and have found out the 
different types of faults removed in the modules with a fixed 
budget and a prerequisite level of reliability. Genetic Algorithm 
is developed to solve the problem of resource allocation. 
Numerical example is discussed to illustrate the solving of the 
discussed optimization problem through GA. 


FUTURE SCOPE 

The present study is done under the assumption of 
independence of the failures of different modules. In future, 
dependence of the failures from different modules as well as 
the architecture styles and connectors reliability can also be 
studied. 
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review(s) of the considered manuscripts, was a patnstaking process, but it helped 
us to ensure that the best of the considered manuscripts are showcased and that 
too after undergoing multiple cycles of review, as requtred. 


The nine papers that were finally pubfshed were chosen out of seventy six 
papers that we recetved from all over the world for thts issue. We understand 
that the confirmation of final acceptance, to the authors / contributors, 
sometime is delayed, but we also hope that you concur with us in the fact that 
quality review is a time taking process and ts further delayed if the reviewers 
are senior researchers in their respective flelds and hence, are hard pressed for 
time. 


We further take pride in informing our authors, contributors, subscribers and 
reviewers that the journal has been indexed with some of the world’s best 
international publishers Ake EBSCO (USA), Cabels Dtrectory (USA), DOAJ 
(Sweden), Google Scholar and J-Gate. It will certainly further increase the 
referencing of the papers published tn this journal thereby enhancing the tmpact 
factor. 


We wish to express our stncere gratitude to our panel of experts in steering the 
considered manuscripts through multiple cycles of review and bringing out the 
best from the contributing authors. We thank our esteemed authors for having 
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their original research work. We would also wish to thank the authors whose 
papers were not published in this issue of the Journal probably because of the 
minor shortcomings. However, we would like to encourage them to actively 


contribute for the forthcoming issues. 


The undertaken Quality Assurance Process involved a series of well defined 
acttvities that, we hope, went a long way tn ensuring the quality of the 
publication. Stil, there is afways a scope for tmprovement, and so, we request 
the contributors and readers to kindly mail us thetr criticism, suggestions and 
feedback at biftt@bvicam.ac.tin and help us in further enhancing the quality of 
forthcoming tissues. 
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ABSTRACT 

The proposed work describes the performance evaluation of 
different types of ring oscillator Voltage Controlled Oscillator 
topologies on the basis of two characteristic parameters power 
and frequency in 70 nm CMOS technology. The various 
topologies analyzed include Current Starved VCO, VCO with 
Gates of PMOS Transistor Grounded, VCO with PMOS Diode 
Connected, VCO with NMOS Diode Connected, VCO with 
voltage applied to both PMOS and NMOS Transistor. 
Simulation of different parameters of ring oscillator VCO is 
carried out on Tanner tool Version 13. VCO topologies are 
evaluated on the basis of frequency and power consumption by 
taking lower supply voltage of 1.2 V. Performance evaluation 
and comparison of different topologies results in minimum 
power consumption of 0.57 uW by Current Starved VCO 
topology and maximum operating frequency of 0.57 MHr by 
VCO with Gates of PMOS Transistor Grounded. 


KEYWORDS 

Current Starved VCO, VCO With Gates of PMOS Transistors 
Grounded, VCO With PMOS Transistors Diode 
Connected, VCO With NMOS Transistors Diode Connected, 
VCO With Voltage Applied To Both PMOS And NMOS 
Transistors 


1. INTRODUCTION 

An oscillator that can be tuned over a wide range of 
frequencies by applying a voltage (tuning voltage) to it, or in 
other words, an oscillator that changes its frequency according 
to a control voltage feed to its control input is Voltage 
Controlled Oscillator [1]. As shown in Figure 1, the frequency 
of oscillation is varied by the applied controlled voltage, while 
modulating signals may also be fed into the VCO to cause 
frequency modulation (FM) or phase modulation (PM)[2][24]; 
a VCO with digital pulse output may similarly have its 
repetition rate (FSK, PSK) or pulse width modulation (PWM). 
The oscillator first convert voltage signal to current, and then 
current is converted into frequency [1]. This has numerous 
applications ranging from (frequency synthesizers to 
transceivers. The design of high performance monolithic VCO 
has been one of the active area of research and development in 
recent years[22]. A CMOS VCO can be built using ring 
topology, relaxation circuits or LC tuned circuit [2]. The 


equation (1) shows the basic definition of VCO according to its 


ee a ——— ~- a am e a — a = n —_ 
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operation forming a characteristic between input voltage and 
frequency. 


Wout= Wo + Kyeo * Veontrol-.:-+++: (1) 


Voltage to rrent to Frequen 
Vin Current Converter Fout 
Converter 


Figure1: “Definition of VCO” 


Here, W, represents the intercept corresponding to Veo = 9 
and K» denotes the ‘gain’ and ‘sensitivity’ of the circuit [2]. 

There are basically two types of Harmonic oscillators, LC and 
Ring Oscillator. The main advantage of Ring oscillator over LC 
oscillator is that the ring oscillator can be easily fabricated in 
CMOS technology as compared to LC oscillator, since the 
fabrication of inductor need huge amount of space [5][24]. 


2. DEVELOPMENT OF NEW DESIGN 
METHODOLOGY FOR OPTIMIZATION OF 
POWER WITH LOW VOLTAGE 

The general source of dissipation in any CMOS circuit is the 

current drawn while switching. Since knowing the number and 

capacitance, the voltage change on a gate capacitance requires 
charge transfer and hence causes power consumption. Once 
this gate capacitance is charged, the gate can maintain the DC 
voltage level without any additional charge movement and does 
not consume any current. The required charge to change 
voltage levels on the gate is described by the following 
equation [18][24]. 
Qgate =Cgate Va (2) 

Qgate is the charge required to change state, Cgate is the gate 

capacitance,Vg, is the power supply voltage. Switching 

generates a current proportional to operating frequency (F) of 
the VCO. Since current is defined in terms of coulombs per 
second (amperes), the current can be calculated as shown in 

equation (4) [19]: 

] = Qgate x Frequency = (Cgatex V 4a) x F (3) 

Where | is the current in amperes (coulombs per second) 

The total current can be generalized into a figure which will 

include all the node capacitances in the device. 

Idevice = Ctotal x Vy, x Fose 

where 


(4) 
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Idevice is the total device current, 
Ctotal is the total node capacitance of all internal switching 
n 
“es the switching frequency of the circuit. 
All of the internal switching nodes are an‘unmanageable task; 
the current under different conditions can be determined 
empirically by measuring the current level for a particular 
known frequency and supply voltage conditions, and then 
scaling the current value to determine the behavior under 
different conditions [23]. The dependency of these currents is 
directly on the system operation and the supply levels [25]. 
The VCO power dissipation is function of its frequency hence 
should be modeled with care. 
Average Power Dissipation = Fosc.N.C. Vsa (5) 
Here Fosc is the oscillaton frequency, C is the device 
capacitance and N may be the number of stages in case of a 
ring oscillator, 
Assuming the inverters are identical, the oscillation frequency 
is given in equation below, 

foo ™1/(N* (thea + toa )) (6) 
Where n is the number of inverters in the ring oscillator and 
(tou + ton) is the propagationdelay time of each Inverter. The 
propagation delay times ty and tẹh determine the input to- 
output signal delay during the’ high-to-low and low-to-high 
transitions of the output, respectively 
Ring oscillator is designed by using five CMOS inverters 
having 

(W/L)p=12/2 and (W/L)n™5/2. (7) 
These specifications are chosen in the relation 
(W/L)p=2.5 (W/L)n 8) 
by applying condition for symmetric inverter i.e. K=K,. By 
taking these values of W/L, if the DC characteristics of CMOS 
inverter are observed switching point is found to closer to 
2.5(V 25/2). 
The performance evaluation and comparison on the basis of 
two critical parameters, power consumption and frequency, of 
following topologies of ring type VCOs are discussed in the 
ese work [24]. 
Current Starved VCO 
e VCO with gates of PMOS transistors grounded 
e VCO with PMOS transistors diode connected 
e VCO with source voltage applied to both PMOS and 
NMOS transistors 

e VCO with NMOS transistors dlode connected 
In the further sections, proposed circuit schematic of the 
particular topology of ring oscillator, the associated simulation 
results and then the tabular representation of the Input 
parameters control voltage and time, over which the parameters 
under consideration, power dissipation and frequency are 
calculated, and the respective voltage verses frequency graphs 
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are drawn, which are discussed individually at length. 


3. CURRENT STARVED VCO 

Figure 2 depicts the proposed current starved VCO. It consists 
of five stage ring oscillator with a current mirror circuit. This 
ring oscillator is designed by taking into consideration odd 
number of inverters which form a closed loop with positive 
feedback. Transistor MS and M6 form a current mirror circuit. 
PMOS transistor M3 and NMOS transistor M2 form an inverter 
while transistors M1 and M4 are used for biasing. 





Figure2: “Schematic circuit of Current Starved VCO” 


3.1 SIMULATION RESULTS: | 
Equation (1) defines the VCO in terms of control voltage and 
output frequency. This definition can be further developed for 
specific types of VCOs. For example equation (9) gives the - 
output frequency of a current starved based stage selectable 
VCO [21]. 


foe = Pf [(Woon Vs) Vi] Wo Va -Veo }/N. Gx Voo [9], 


Where fow is the output frequency generated by the VCO, fis 
the nductance parameter, Vcon Is the control voltage 
VDD & VSS are the power supplies, N is the number of stages, 
Ctot Ís the total capacitance on drains of MOSFETs and Væ Is 
gate to source voltage of PMOS. 

The current starved VCO circuit, performed after the transient 
analysis of current starved VCO with pulse input voltage, 
simulated using Tanner EDA ver. 13 T-spice simulator as 
shown in the figure 2. The number of stages of ring oscillator 
was optimized with center frequency of 380 Mhz. The current 
starved VCO draws 100uA drain current from supply voltage 
of 1.2V. 
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. Technology 





Figure3: “Simulation results of proposed current starved 
VCO”, 


The summary of simulated result waveform is shown in table 
ae 


mg rs a BEC T C 

| 

os] os [3.70 [027 [028 
-Es os ees | oas M osr 
o os | 13 





E 
Table 1: “Frequency & Dynamic Power Dissipation of Current 
Starved vco” 


A graph is plotted between voltage and frequency as shown In 
the Figure 4, in order to observe the relation between these two 
„quantities. 





rE ea et ar as 
‘Figured: “Voltage vs. Frequency Plot of Current Starved 
VCO” . 


4. VCO WITH GATES OF PMOS TRANSISTORS 
GROUNDED 7 
Figure 5 depicts the proposed VCO with gates of PMOS 
transistors grounded. It consists of five stage ring oscillator. 
This ring oscillator made by odd number of inverters which 
forms a closed loop with positive feedback.In this type of 
VCO, PMOS transistor is always ON since the gate terminal of 
PMOS transistor is connected to ground and PMOS transistor 
gives strong |, so it behaves as a resistor. 





Figure 5: “Schematic Circuit of proposed VCO with gates of 
PMOS Transistors Grounded” 


4,1 SIMULATION RESULTS 

The VCO with gates if PMOS transistors grounded performed 
after the transient analysis of PMOS transistor grounded with 
pulse input voltage, simulated using Tanner EDA ver. 13 
simulator is as shown in the following Figure 5. The number of 
stages of ring oscillator was optimized with a center frequency 
of 470MHz. The VCO with gates of PMOS transistors 
grounded draws 100uA of drain current from a supply voltage 
of 1.2V. 





Figure 6: “Simulated results of proposed VCO with Gates of 
PMOS Transistor Grounded” 
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The summary of simulated résult waveform is shown in table 


below: 
í Control Time | Frequency | Dynamic 
Voltage (V) (ns) (GHz) 



















i o oon o 
Hie oes 
404 | 286 o | 026 
sos [23s [0302 [oat 


8 | 08 |20| 0S | 135 | 
Co | os pe? os | 18 
Table 2: “Frequency & Dynamic Power Dissipation of VCO 

_ with Gates of PMOS Transistor Grounded” 


A graph is plotted between voltage and frequency, which is 
shown in the Figure 7, in order to observe the relation between 
these two quantities. 





Figure7: “Voltage vs. Frequency Plot of VCO with gates of 
PMOS transistors grounded” 


5S. VCO WITH PMOS TRANSISTORS DIODE 
CONNECTED 

Figure 8 depicts the VCO with PMOS transistor diode 

connected. It consists of five stage ring oscillator. This ring 

oscillator is designed by back to back connection of odd 

number of inverters which forms a closed loop with positive 

feedback as per the requisite Barkhausan's criteria. 


3,1 SIMULATION RESULTS 
Figure 9 shows the simulated waveform of aps 


VCOperformed after the transient analysis of PMOS transistor 
diode connected with pulse input voltage, simulated using 


Tanner EDA ver. 13 simulator. In this VCO, gates of upper 


3 a a 
r or [o 0s [1.034 


PMOS transistors are connected to their drains. The source 


voltage is applied to the gates of lower NMOS transistors, 





Figure8; “VCO with PMOS Transistor Diode Connected” 


whe Tee IH Cerp 


we Evid -= + +a -r m - 





Pigures: “Simulated results of proposed PMOS transistor 
diode connected VCO” 
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The summary of simulated result waveform is shown in table source voltage is applied to the gates of upper PMOS 
below: transistors. 


; Time | Frequency 
vas (ns) TE 


Tos ae [ome oe 
oe Oe 
Ca | 0a sss | 0.179121 
















ps [os 349 os [26 
e [06 [349 | o2 | 425 





a | 08 [sie | os | 338 
3 os fais 032 | 10951 
To [10 [307 | 032 | 1332 
in| it [ 295 [033 [16.89 


Table 3: "Frequency & Dynamic Power Dissipation of VCO 
VCO with PMOS Transistor Diode Connected” 





A graph is plotted between voltage and frequency, which is 
shown in the Figure 10, in order to observe the relation 
between these two quantities. 


\ atinge V/S Crequemey Phot 





p Figurel1: “Schematic Circuit of proposed VCO with NMOS 
transistors diode connected” 





; tt ae 
Figure10: “Voltage vs. Frequency Plot of VCO with PMOS 
Transistor Diode Connected” 


6. VCO WITH NMOS TRANSISTORS DIODE 
CONNECTED 

Figure 11 depicts the VCO with NMOS transistor diode 
connected. It consists of five stage ring oscillator. This ring 
oscillator is designed by back to back connection of odd 
number of inverters which forms a closed loop with positive 
feedback as, per the requisite Barkhausan’s criteria. VCO 
circuit is simulated using Tanner EDA T-spice simulator ver 
13. 


SH yD ED 


6.1 SIMULATION RESULTS . 

Figure 12 shows the simulated waveform of proposed 
VCOperformed after the transient analysis of NMOS transistor 
diode connected with pulse input voltage. In this VCO, gates of 


lower NMOS transistors are connected to thelr drains. The  Figurel2: “Simulated results of proposed NMOS transistor 
diode connected VCO 
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The summary of simulated result waveform is shown in table 


below: 
i Time | Frequency 
volage (ns) "(GHD 


tH EAR 10 ME 
=a 
ee o 
C4] o [3m] oar | 020 
rs | 0s | 333 | 030 | 031 
os | 357 | 0.28 | 0.425 
e | os | 613 | 026 | 0432 | 
s | 09 | 925, 010 | 034 | 
i, e a a 
Bel ar J e ee l 


Table 4: “Frequency & Dynamic Power Dissipation of VCO 
VCO with NMOS Transistor Diode Connected” 















A graph is plotted between voltage and frequency, which is 
shown in the Figure 13, in order to observe the relation 
between these two quantities. 


Vodiage YA t reqmeney Pied 


Ea) 





to fi) 
ar a: gt ae E 


Figure13: “Voltage vs. Frequency Plot of VCO with NMOS 
Transistor Diode Connected” 


7. VCO WITH YOLTAGE APPLIED TO BOTH PMOS 
AND NMOS TRANSISTORS 
Figure 14 depicts the VCO with voltage applied to both PMOS 
and NMOS transistors. It consists of five stage ring oscillator. 
This ring oscillator is designed by back to back connection of 
odd number of Inverters which forms a closed loop with 
positive feedback as per the requisite Barkhausan’s criteria. In 
this VCO, the two transistors M5 and M6 are ellminated and 
the source voltage is applied to the gates of both lower NMOS 
transistors and upper PMOS transistors.VCO circuit is 

simulated using Tanner EDA T-spice simulator ver 13. 
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7.1 SIMULATION RESULTS 

Figure 15 shows the simulated waveform of proposed VCO 
performed after the transient analysis of voltage applied to both 
PMOS and NMOS transistors with pulse input voltage. This 
VCO is having 380MHz of center frequency with 0.58uW. 





Figurel4: “Schematic Circult of proposed VCO with voltage 
applied to both PMOS and NMOS transistors” 
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Figurel5: Simulated results of Sead voles ooed t0 

both PMOS and NMOS transistor VCO pulse input voltage” 


A graph | is plotted between voltage and frequency, which is 
shown in the Figure 16, in order to observe the relation 
between these two quantities. 
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T 5 “Frequency & Dynamic Power Dissipation of voltage 
applied to both PMOS and NMOS transistor VCO” 





Figurel6: “Voltage vs. Frequency Plot of VCO with voltage 
applied to both PMOS and NMOS transistors” 


8. COMPARATIVE ANALYSIS OF VOLTAGE VS 
FREQUENCY PLOTS OF RING OSCILLATOR VCO 
TOPOLOGIES 

On the basis of results derived from graph plotted between the 

two critical parameters of ring oscillator VCO, dynamic power 

dissipation and frequency for different time and control 
voltages, for the various ring oscillator topologies, a table can 
is formed which compares the results of the topologies under 
consideration. Table 6 as shown below establishes the 


comparison. 


S.No. Topology ie re (in 
Dispor D pation 


ME 


eared 










VCO with 
voltage 
applied to 
both PMOS 
and NMOS 
Transistor 
Table 6: "Performance Comparison of topologies under 
consideration in terms of power and frequency” 





The Table 6 results are plotted tn the form of a graph, as shown 
in Figure 17, which establishes an excellent comparative study 
of Voltage Vs frequency plots of various ring topologies of 
VCO. ` 


Tenga tiid E 
Camper Sadr of Yeinge Ta Fecomcacy of uins SCO Tapalagr F 
=a 


Ce T 
4 





—e— Current Starred WCO (lst) 

—a— VCO with gates of PMOS transistors grounded 
YCO with PMOS transistors diode connected 
YCO with NMOS transistors diode connected 


—a— VCO with source vollage epplied to both PMOS 
and NMOS tranaistors 


Figure17: “Comparative analysis of Voltage Vs Frequency of 
various ring topology VCO” 


CONCLUSION 

The proposed work establishes the design and comparison of 
Current Starved VCO, VCO with Gates of PMOS Transistor 
Grounded, VCO with PMOS Diode Connected, VCO with 
NMOS Diode Connected, VCO with voltage applied to both 
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PMOS and NMOS Transistor, totalling 5 different VCO 
topologies on the basis of their voltage, power and frequency in 
70 nm CMOS technology. These VCO’s topologies are 
designed using ring oscillator. 

Different ring oscillator VCO topologies under consideration 
are simulated on Tanner EDA tool ver. 13. The supply voltage 
used for the simulation is 1.2 V. Different topologies exert 
different power dissipation and frequency characteristics. Their 
performance comparison is obtained by plotting between 
voltage and dynamic power dissipation. Performance 
evaluation and comparison of different topologies results in 
minimum power consumption of 0.57 uW by Current Starved 
VCO topology and maximum operating frequency of 0.57 
MHz by VCO with Gates of PMOS Transistor Grounded. 
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ABSTRACT 

The characteristics of the Internet, which include digitization, 
anonymity, connectivity, mobility, and transnational nature; 
blur the traditional model of crime investigation / law 
enforcement and call for new strategies. Many simple yet 
popular malicious activities over internet are carried out by 
scammers / con artists largely through electronic mails and 
websites using cyber hypnotism, often in combination with 
distribution and propagation of malware. Such crimes of 
persuasion need to be managed through due diligence. 

Recently, an added web- based threat is recognized in the 
Jorm of scareware which is making even the savviest of 
computer users their victim, and therefore, there is a need to 
focus on trying to detect such suspicious activity as quickly as 
possible in order to shut it down. An in-depth analysis of few 
scareware reveal that they have created many new and not so 
widely recognized online threats with inside intelligence by 
providing primary delivery mechanism. for malware such as 
rogue anti-virus and anti-spyware, which are beyond the 
reach of many legitimate anti-virus programs currently in use. 

They may cause a Dental of Service (DoS) attack forcing the 
system to crash or even a Distributed Denial of Service (DDoS) 
attack Looking at such unprecedented challenges in 
cyberspace, a policy of cyber vigilantism adopting an active 
defense rather than a reactive approach is contemplated. It 
is felt that in this age of mobile workforce, many of such 
people working as cyber analytics, or cyber-crime researcher 
may accomplish this work of community policing and play as 
proactive guardians of cyberspace. 


KEYWORDS 
Internet, Cyber Hypnotism, Con Artist, Scareware, Cyber 
Vigilantism. 


1. INTRODUCTION 

The Internet, as understood today, is a vast global network of 
computers storing information on every conceivable subject of 
interest to humankind. Its’ original designers aimed to create 
a communication system between trusted people and 
organizations for academic and military purposes resilient in 
the face of a nuclear attack. There were no views to the 
security of the computers attached to neither these networks 
nor the information stored in these computers. The 
commercial use of internet came as an afterthought. Today, jit 
has evolved from a mere means of communication to an open 
and insecure system of worldwide network. Real world’s 
constraints such as time and space do not exist on it. 
National boundaries have little meaning in eta and 
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information flows continuously and seamlessly across 
political, ethical and religious divides. Even the 
infrastructure that makes of cyberspace (software and 
hardware) is global in nature. Because of this global nature of 
cyberspace, the existing vulnerabilities are also open to the 
world and to anyone, anywhere, who has sufficient 
capability to exploit them. These infrastructures are, therefore, 
being continuously probed for weakness and vulnerability by 
new breed of professional cyber criminals primarily motivated 
by huge financial gains. In recent years, several electronic 
mail frauds and scareware, herein discussed as crimes of 
persuasion, have brought to light the darker side of the 
Internet. 

This paper examines the\concept that industry, government 
and the public are essentially naked in cyberspace, with 
privacy diminishing, identity theft increasing, financial 
accounts and intellectual property becoming highly vulnerable 
to cyber criminals. Taking lessons from real-world incidences, 
this paper discusses attackers’ technique in general terms, 
more particularly related to cyber hypnotism (i.e, 
hypnotizing people through internet by exploiting various 
human weaknesses and emotional vulnerabilities in 
cyberspace) and related crimes of persuasion including email 
fraud besides scareware (a type of malware). In this 
context, Hypnotism, as understood, is “a wakeful state of 
focused attention and heightened suggestibility, with 
diminished peripheral awafeness, usually induced by a 
procedure known as hypnotic induction, which is commonly 
composed of a long series of preliminary instructions and 
suggestions”. This is in contrary to a popular misconception 
that hypnosis is a form of unconsciousness resembling sleep’. 
Malware, also known as malicious code & software (e.g, 
viruses, Trojan horse, worms, keyloggers, scareware, spyware 
etc.), meant specifically to damage or disrupt a system 
irreparably and to steal the personal information and address 
books existing on the system in cache memory / records, by 
hijacking the browser and redirect to a phishing — con 
webpage. Evidently, many cyber-crimes, largely carried out 
through a series of hypnotizing emails, are often associated 
with malwares and warrant a constant vigilance at individual 
level besides better technical controls. It is noteworthy that 
majority of sucn cyber-crimes do not require a high level of 
technical specification and can be prevented through ‘due 
diligence’, nevertheless, sophisticated cyber-crimes demand an 
altogether different approach. Cyber Vigilantism, as visualized 
in this communication, is “a proactive policy to attack the 
attackers in an ethical way with restraint in a limited manner 
rather than sopane a soft paag of passive reaction.” In 
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turn, it will encourage good guys into action and discourage 
the bad guys in the near future. 


2. E-MAIL FRAUD AND CYBER HYPNOTISM 
Cyber-crime is regarded as computer-mediated activities 
which are either illegal or considered illicit by parties and 
which can be conducted through global electronic networks’. 
Conceptually cyber-crimes differ little from traditional crimes 
as they frequently involve perpetrators who have no physical 
presence at or even near the crime scene. In such crimes 
when no direct physical evidence exists, inferential 
evidence, or evidence that some aspect of the system has 
been modified as a direct result of the intrusion, is the primary 
source of clues. In fact, cyber-attacks come in two forms: one 
against data, the other on control systems. The first type 
attempts to steal or corrupt data and deny services. The vast 
majority of internet and other computer attacks fall into this 
category. Individuals, who wish to use computer as a tool 
to facilitate unlawful activity are finding that the Internet 
provides a vast, inexpensive, and potentially anonymous 
way to commit unlawful acts. The Net enables transaction 
between people who do not know, and in many cases cannot 
know each other’s physical location. As of 2012, the 
estimated number of Internet users worldwide reaches 
2,267,233,742°. Most of these users have electronic mail 
(email) accounts on one or more mail systems and 
emails are being utilized by cyber criminals as the vehicle of 
persuasion. According to Symantec, a security-software 
vendor, about nine-tenths of the 140 billion emails sent daily 
are spam (unsolicited bulk commercial emails); of these about 
16% contain money-making scams including phishing 
attacks“. E-mail fraud relies on nalve individuals who put 
their confidence in ‘get-rich-quick’ schemes such as ‘too- 
good-to-be-true’ investments or offers to sell popular items 
at ‘impossibly low’ prices. In this, confidence tricks tend to 
exploit the inherent greed and dishonesty of their victims: the 
prospect of a ‘bargain’ or ‘something for nothing’ can be 
very tempting. Over the years; email has evolved from a 
means of easy communication to one of the cornerstones of a 
large-scale criminal economy. For example, Spam today is best 
known as a way to steal a person’s identity and sensitive data 
or to gain access to corporate intellectual property and used as 
phishing’. The spam emails are often sent from Internet cafes 
equipped with satellite Internet at a very low cost. They 
consume significant resources of targeted computer and are 
used as a delivery mechanism for cyber-attacks. Spam 
often contains viruses, worms, scams, and drive-by 
download malwares. Symantec reports that 91.9% of email 
traffic is spam and that 95% ofall spam is generated by 
botnets (i.e, malware infected remotely controlled 
computers)®, It may be noted that by opening spam, users 
open their machines and their entire network to become 
members of a botnet, which can compromise the entire 
network. 

Phishing, is a variation on “fishing”, “the idea being that 
bait is thrown out with the hopes that while most will ignore 


the bait, some will be tempted into biting”’. It is the act of 
attempting to fraudulently acquire sensitive information by 
masquerading as a trustworthy person or business with a real 
need for such information in a seemingly official electronic 
notification or message (most often an email, or an instant 
message)’. Phishing may lead to identity theft and fraud by 
finding out the users’ personally identifiable information (PIIs) 
such as user name, passwords and credit card details typically 
for an economic gain by masquerading as a trustworthy entity 
in an electronic communication; such as pretending to be from 


a well-known organization, a legitimate online retailer, 


trustworthy companies, bank, government agency or someone 
claiming to be a prospective employer. Some phishing emails 
try to convince that something good will come from 
participation. More commonly, phishing attacks use email or 
malicious web sites to solicit personal, often financial 
information. Clicking a link in a phishing email typically 
takes one to fake website that may be related to even a 
scareware website. Common methods of installing malware 
in phishing attacks are carried out through - fake 
advertisements or ‘popup’ windows on web sites. Experts 
suggest not clicking on links directly from a suspicious e- 
mail. Similarly, it may be mentioned that secure web sites 
use a technique called SSL (Secure Socket Layer), 


-‘mdicated by HTTPS:// instead of HTTP:// at the beginning of 


the address (the "S" stands for "Secure') and by a locked 
padlock icon which must be found either at the address bar or 
in the bottom right hand corner of browser window. A 
padlock appearing anywhere else on the page does not 
represent a secure site. It is also suggested if the first part of 
the web address consists of numbers; the site should 
probably not be trusted. Phishing attacks usually use a 
combination of email spoofing and web spoofing to trick 
people into giving personal and financial information. In 
particular, phishing and pharming (luring people to disclose 
sensitive information by using bogus emails and websites) 
are two popular security threats that netizens and financial 
institutions are facing at large. Pharming is a hacker's attack 
aiming to redirect a website's traffic to a bogus website where 
they harvest the users’ information’. Pharming can be 
conducted either by changing the hosts file on a victim's 
computer or by exploitation of a vulnerability in DNS ‘aie 
$0 

An added threat in new millennium is the cee of- 
internet as a useful tool by scammers/con artists, using 
hypnosis as a tool to make money by exploiting various 
human weaknesses and emotional vulnerabilities. Hypnotizing 
people through the internet has a greater range. It allows the 
scammers / con artists to manipulate the victim's mind and 
play with it like revealing something that he / she doesn't want 
to reveal or making him / her to do something else. Given 
email’s nature of human to human communications, it 
is being used as a social engineering vehicle by con 
artists/scammers. It is observed that many of the cyber-crimes 


related to data theft and identity theft display a judicious mix ` 


of cyber hypnotism and malwares. Recognizing the 
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convergence of cyberspace and hypnotism, the author 
has used cyber hypnotism as a basket term for the type of 
scams and frauds, referred as crimes of persuasion, popularly 
known as London scams, Nigerian fraud, Canadian fraud, 
Romance scam, Lottery scam etc. These are scams that 
appeal to people’s greed, goodwill or other emotions to use 
the victim to provide the access and assistance to information, 
the money or other resources, that are the target of the 
criminal. What is common in all these scams is that scanned 
versions of official documents are emailed to potential 
victims in order to convince the genuineness of the 
transaction. Internet users, now, need to be more vigilant as 
new and more insidious mind tricks arise every day, 
especially if a message is either too good or too bad to be 
true. Users are too often seduced by a wonderful offer or alert. 
It is suggested to be suspicious if someone contacts 
unexpectedly and asks for personal information. Appeals to 
achieve happiness via increased wealth, relationships or health 
are tempting people to become scapegoat. It is noted that if 
the message appears to be one of gain then promotion- 
focused individuals tend to be motivated and get attracted to 
go ahead, whereas others who are prevention-focused 
individuals, tend to be motivated to avoid the sky falling i.e., 
heavy losses. Both types of individuals are illusioned by 
legitimacy and/or associated with such messages. In 
cyberspace, for instance, in order to convince the legitimacy 
of the email, all publicly available highly-personalized 
information is included by scammers/con artists in such 
emails. Furthermore, they create a story line in such a way that 
it induces the emotional sensitivity of the innocent human 
beings, and then they ask either for wire transferring the 
money or provide a website link within an email, which serves 
many purposes. For example, the link may be relating to a 
login page of any financial institution so as to get the PIIs from 
the legitimate users, resulting not only in the theft of login 
information but also in the identity theft as well as credit card 
fraud. The link may be relating to a login page of any email 
account like Gmail, Yahoo, RediffMail etc. where the user 
enters his / her login information and unknowingly helps the 
con artists in delivering his / her secret credentials. Afterwards, 
the scammer / con artist may use the contacts present in the 
address book and will try to scam other people from the 
contacts. Similarly, the link may be related to a legitimate 
website embedded with blended malware which makes the 
website visitors’ machines a cyber- victim. However, this 
remains’ hidden from the first-owner of the computer 
system. By this way, the scammer/con artist creates a 
backdoor in the computer system and is able to monitor 
and ‘control the computer system remotely. This information 
is generally sold in the underground market of internet. All 
such incidents are happening because people are hypnotized 
to such an extent that they are ready to believe what the 
scammers / con artists are trying to convey. Furthermore, a 
widespread use of commodity operating systems and software 
products delivering rich functionality but lacking security has 
aggravated the problem. 


Investigation of cyber-crime cases and appraisal of threat 
data analyses of online crimes especially related to cyber 
hypnotism reveals that many of these are being carried out 
with basic equipment and a simple scheme with little 
efforts. Hence, contrary to popular belief, most of such 
attacks perpetrated against computer systems do not require a 
high level of technical sophistication, yet present an 
unprecedented challenge for law enforcement authorities. As 
technologies become more user-friendly, computer-users 
require less computer knowledge and are, therefore, more 
vulnerable to cyber-crime, home users perhaps the most. Often 
poorly protected, personal computers are a favorite target for 
such criminals. 


3. CYBER VIGILANTISM: A DISCUSSION 

McAfee reports a 660% rise in scareware over the past two 
years, and a 400% increase in reported incidents in 12 months. 
It also reports that cybercriminals make profits upwards of 
$300 million worldwide from scamming consumers with 
scareware [10]. A study conducted by U.K. Government in 
February 2011 on the cost of cyber-crime reports that U.K. 
citizens are losing £ 30m due to scareware and fake anti-virus’’. 
Furthermore, it is argued that fake anti-virus software 
operation generate many millions of dollars and investing this 
dirty money into Internet Service Providers (ISPs) for 
shady ‘dealings is also emerging as a very sensible move 
for bad guys. ISP’s are often accused of not doing enough to 
police illegal traffic. In order to curb scareware, it is felt 
that anti-virus deployment in computer systems must be 
made mandatory while hiring an internet connection from 
ISPs or their vendors, which may be audited by ISP’s at the 
time of providing internet services to their potential 
customers and may be counterchecked by cyber vigilantes. It 
is observed that cyber criminals are increasingly using highly 
reputable and popular legitimate websites and social 
networking pages to infect computers. 

Looking at such unprecedented challenges, the author strongly 
advocates a policy of involving high tech cyber security 
experts and encouraging cyber vigilantism with government 
as regulatory authority to co-ordinate them. During cyber- 
crime investigations, the exact nature and positioning of 
cyber-crime evidence can be crucial to unraveling the chain of 
events. Time stamps in logs, records of network activity, new 
directories and files created by the attacker, Incoming / 
outgoing mail or other packets during the period when the 
intruder was actively exploiting the system; all of these are 
Important pieces of the overall puzzle. It is suggested that 
these professional cyber vigilantes can be utilized to gather 
such evidences. Cyber Vigilantism by private citizens is a 
response to their frustration with the number of rogue sites 
in operation and what they believe is the unwillingness or 
inability of our government to take them down. It is felt that 
conventional law enforcement just can’t match the skills 
needed. Besides, one can’t trust law enforcement to keep ones 
secrets from becoming public knowledge. It is worth to 
mention that after 9/11, it is self-styled vigilantes who came to 
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America’s rescue. Similarly, Cyber Vigilante Groups may be 
used as a source of information in the Figure! against fraud, 
wherein consuming the bandwidth of fraudulent banking and 
lottery sites in an attempt to force them off the internet. It is 
learnt that few of the cyber vigilantes are open source cyber 
analysts who also visit extremist sites to glean information. 
One of the most famous examples of such Cyber Vigilante 
Groups is The Jester (th3j35t3r) who forces off Cyber Jihad 
websites'*. They can offer advise and tools on how to avoid 
scammers and list suspected fraudulent websites. They can 
search logs of Intenet service providers for "attack 
packets” and try to trace where they are coming from and 
who is behind them. They can also assist when businesses 
are faced with the first manifestations of cyber-crimes, such as 
threats besides educating internet users so that they take basic 
precautions when surfing the web. In addition, cyber 
vigilantes may also be utilized for website and domain ratings 
to benefit users. It is worth to record that most interesting 
action occurs behind the scenes, wherein security vendors, 
internet service providers, domain name_registrars and some 
of the most talented individual researchers globally 
communicate every day on new attacks, compromises, bots 
and threats. Malware and exploit samples, locations of 
compromised hosts and information on crime ware are shared 
as quickly as the information is generated. Most importantly, 
these non-governmental people, many of them working for 
free, are all there working together as cyber vigilante to thwart 
various cyber threats. Such voluntary security 
professionals/academia take reports every day from the 
internet provide timely and actionable information on 
botnets and malware threats and also pass this information 
to the public. 


4. SCAREWARE: 

METHODOLOGY 
Scareware is malware masquerading as free or trial anti-virus 
software or some other free online scam”. In this context, the 
author while performing the routine activities in the month 
of July 2010 in Computer-Aided [Nvestigative 
Environment (CAINE), a Linux operating system that 
offers a complete forensic environment, received an email 
message of an international cyber security conference 
(Figure 1). Since the email was highly personalized, 
hence, the author thought it to be a benign one. It is 
noteworthy that the email was received in the inbox and 
not in the spam. It was containing a website link of the 
international conference’s website. In order to know more 
about the theme and topics of the conference, the author 
clicked on the given link. The website too looked like a 
legitimate one. After few seconds, the author observed a 
pop-up alert with a warning message of privacy violations. 
It gave an impression as if a trial version of anti-virus software 
had scanned the system under reference but unable to clean 
these privacy violations. The popup alert also displayed the 
recommendation to purchase full version of the trial version to 
remove privacy violations (Figure. 2). 
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Figure 2: “Popup Alot” 


After selecting ‘Remove Now’ option, an Antivirus XP 
Professional tool got downloaded, which revealed that the 
system is infected with serious threats displaying a full- 
screen image of My Computer’s environment that 
always appear in Microsoft Windows XP, with a message 
to remove the Viruses and Trojans found in the system (Figure 
3). The Windows Operating System typeface (a look similar 
to the ‘My Computer’ in Windows XP environment) raised an 
immediate suspicion for further investigation about its 
genuineness since the author was working in a Linux 
environment as shown in the Figure. 1. Hence, it was decided 
to carry out further investigation. 

Evidently, the threat warning was a fake one probably a 
scareware (rogue anti-virus & anti-spyware program) attack. 
Usually, scareware sellers use popup advertisements 
deliberately designed to look legitimate using the same 
typefaces as Microsoft and other well-known software 
providers. They appear, often when the user is switching 
between websites, and falsely warn that a computer's security 
has been compromised. If users click on the popup message, 
they are directed towards another website where they can 
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download the fake anti-virus software supposedly needed to 
clean up their computer. 
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Figure 3: “Fake Warning of Infection (AntiVirus XP 
Professional)” 


After clicking on the “Remove All” in Figure. 3, a website 
opened having URL address as www.scan4you.biz, and IP 
address as ‘85.31.101.148’ which was reverse mapped as 
*85.31.101.148.staticnano.lv’ with the Route / AS as 
‘85.31.96.0/21" / ‘43513°"*. On further forensic analysis of the 
popup and the website, it was found that the image shown in 
the website was hosted at 
“http-//i080.redikal.ru/1004/c9/760db777d446.jpg” (Figure. 4) 





Figure 4: “ “Image of Website froin ware SEENE was 
downloaded” 


On clicking the ‘Buy Now’ button shown in the website, 
a fraudulent payment page opened asking for all PIIs as 
well as financial information (viz. username, password for 
login into the anti-virus account, email id, credit card number, 
PIN number, issue/expiry date, full name of the card holder). 
Further details of the payment page are given in Table 1. 


URL of Payment Page 







Hosted IP Address 92.241.177.188 
ISP OAO Webalta 


Location Yoshkar-ola, Russia 
Table 1: “Fraudulent Payment Page Details” 





Instead of providing the information on the fraudulent 
payment website, the author registered the scareware by 
reverse engineering the downloaded malicious tool for further 
investigation in Windows XP environment. After installation, 
at the very first instant following major changes were 
observed: 

e Task Manager was disabled ' 


e Malfunctioning of [Ctrl] + [Alt] + [Delete] command 

e ‘Folder Options’ disabled 

e New registries were created, and the existing ones 

were modified 

Default start page, & default 
Internet Browsers were 
www.lameplaying.com/index.php/database, 
Furthermore, following files were being created in each and 
every folder with hidden attribute selected by default: 

e tsjgiq.exe 

e — khx <no extension> 

e 

e 


search engine of 
changed to 


khy <no extension> 

<foldername>.EXE i.e., copy of each folder with an 

extension of .EXE 
Apparently, all the above files were using rootkit technique. A 
rootkit is software that enables continued privileged access to 
a computer while actively hiding its presence from 
administrators by subverting standard operating system 
functionality or other applications’®. It may be noted that 
once a rootkit is installed, it allows an attacker to mask the 
ongoing intrusion and maintain privileged access to the 
computer by circumventing normal authentication and 
authorization mechanisms. 
Later on entering the URL of Gmail.com in Mozilla Firefox 
web browser, it opened a website 
www. lameplaying.com/index.php/database (IP — address: 
67.215.65.132) demonstrating possibility of Pharming. 
Furthermore, network forensic was carried out by installing an 
Intrusion Detection and Prevention System (IDPS) in order 
to monitor the network activities, supposedly being carried 
out after the installation of scareware. It indicated that the 
scareware was continuously sending packets with variable 
size (in bytes) to an IP address with following log analysis 
data (Table 2). 


osted IP Address | 92.241.190.172 
a mpany / ISP Heihachi Ltd. / OAO Webalta 


| Location | Moscow, Russia 
Table 2: “Details EE to external IP Address” 












It is noteworthy that although an internet connection of 2 
Mbps (= 2,048 Kbps) was being used, there was a significant 
change in the network bandwidth before and after the 
installation of scareware that can be populated as in Table 3 
and Table 4, respectively. 


| Ping | Download Total Bandwidth 
34 ms bps | 460.8 Kbps 


Table 3: “Before installation of scareware” 


Ping | Download | Upload | Total Bandwidth | 
: 


Table 4: “After installation of scareware” 
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Network forensic on the packet captured and the log files 
obtained from IDPS revealed that multiple requests and 
information were being sent as given in Table 5. 

IDPS logs indicated port ranges between 1888 & 21372. 
Furthermore, for the IP address 77.91.227.248:2129, the 
remote system’s MAC ID (i.e., machine address) was found to 
be 00-19- E0-A0-B2-8E. A route map of jebena.ananikolic.su 
is given in Figure, 5. 


Protocol / Port IP Address 
anadbolk su 
92.241,190.139 


jobana artaricolke. gu x 


a 02.241.190.72 


“= 


92 244.190,237 


NET 


oe Ae Ss TR 


eananikolic.su 
ion 


alligator. a 







Moscow, 
Russia 






92.241.190.172 


92.241.190..237 


Republic. 
8.5.1.41 


Table 5: “Details ai a network forensic” 
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77,97.224,02) 


hehathi ngl name 92 74] 190173 
ASAI 947 


me 


92.241 160.0719 


Figure 5: “Route Map of jebena.ananikolic.su” 


Additionally, when the scareware was run in a network 
environment, there were multiple PING requests being relayed 
with payload (Table 6). 


No. Type 


Information 





ICMP ae De 8 BOOTPS DHCP 
aquest Server 
239.255.255.250 


AEE z sa ina a environment” 


On further investigation, it was revealed that by using IPNAT 
(IP Network Address Translator) a request list was being 
sent at a very short but regular interval to the Russian IP 
addresses containing following parameters: 

e Subnet mask 

e Domain name 

e Router 

e Domain Name Server 
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Finally, the source file was located which was functioning as 
scareware having following aliases. Interestingly, these 
aliases were changing their names after every reboot 


randomly from the following ones: 
NEBIH.EXE 141,824 bytes 
RMHZB.EXE 138,752 bytes 
1412294.EXE 140,800 bytes 
86221.EXE 133,120 bytes 
7391 1852.EXE size was varying after every click 


After performing intensive malware forensic, some very 
interesting information was uncovered. These were: 


(1) Shell Command of Scareware: 


sheli\\\open\\\ command=VEROVALA\\\\\neblh.exe 
shellexecute=VEROVALA\\\\\neblh.exe 
sheli\\\explore\\\ command=VEROVALA\\\\\nebih_exe 
LCON@BHELLS2.daLL, 4 
Open"VEROYALA\\ \\\nebih 
USE AUTOPLAY#1 


Figure 6: “Scareware shell command” 
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Tracking Digital Footprints of Scareware to Thwart Cyber Hypnotism Through Cyber Vigilantism in Cyberspace 


Here, the command icon=SHELL32.dIl, 4 as shown in Figure. 
6 was actually trying to conceal its icon as shown in number 4 


+ 
+ 
4 ” 
ee ee 
ns 





(2) MDS hash value of NEBIH.EXE is as under 
'  04509f3c5ef5b90a7addc09e65454fcc350745a39 
(3) Following website links were found: 
° www.egydown.com 
(website for Cracked software), 
e www.filestube.com 
(malicious search engine), 
e www.thepiratebay.org (torrent website), 
e jebens.ananikolic.su (pornographic contents) 


(4) Additionally, an encrypted HTML file was also revealed 
on the following path: 

C:\Documents and Settings\<username>\local 
settings\temp\cfircyh => HTML.Crypted!IK 

It appears that to avoid detection by antivirus software, 
authors of HTML.Crypted!IK malware use browser 
features like JavaScript and VisualBasic Script. These 
scripts are small and very often quite simple encryption 
routines hiding the malicious parts of the script. Till 
date, the author isn’t able to decrypt the HTML file. 

(5) VEROVALA and NEBIH.EXE was showing a unique 
behavior in that it was using the same file icon as the 
antivirus software installed in the system under 
reference. Perhaps, it was done in order to fool those 
victims / users who just click on the icon but never read 
the full filename (with extension). As soon as any Plug- 
and-Play (UPnP) device was inserted, this file used to 
copy itself into it with the hidden attribute selected by 
default. 

(6) Behavioral Pattern of VEROVALA / NEBIH.EXE: The 
registry values added to the system as soon as the USB 
drive is inserted were: 


e HKEY USERS\DEFAULT\Software\Microsoft\Win 


dows\Current Version\Explorer\Advanced|Hidden 

e HKEY USERS\S-1-5-21-4058357071-1071901202- 
2123665 1 84- 
1000\Software\Microsoft\ Windows\Current Version\E 
xplorer\Advanced/Hidden 

e HKEY USERS\S-1-5- 
18\Software\Microsoft\ Windows\Current Version\Exp 
lorer\Advanced|Hidden 

e HKEY USERS\.DEFAULT\Software\Microsoft\ Win 
dows\Current Version\Explorer\Advanced|Hidden 


e HKEY USERS\S-1-5-21-4058357071-1071901202- 
2123665184- 
1000\Software\Microsoft\ Windows\CurrentVersion\E 
xplorer\Advanced|Hidden 

e HKEY USERS\S-1-5- 
1 8\Software\Microsoft\Windows\CurrentVersion\Exp 
lorer\Advanced|Hidden 

e HKEY_ USERS\S-1-5-21-4058357071-1071901202- 
2123665 1 84-1000\Software\Microsoft\Internet 
Explorer\Main|Default_Search URL 

e HKEY_USERS\S-1-5-21-4058357071-1071901202- 
2123665 1 84-1000\Software\Microsoft\Internet 
Explorer\Main|Search Page 

e HKEY_USERS\S-1-5-21-4058357071-1071901202- 
2123665 1 84-1000\Software\Microsoft\Internet 
Explorer\Search Assistant 

e HKEY LOCAL MACHINE\Software\Microsoft\inte 
met Explorer\Main|Start page 

(7) RMHZB.EXE created following registry keys: 

e \Registry\Machine\Software\Microsoft\ WindowsNT\ 

CurrentVersion\Winlogon 


e C:\documents and settings\<username>\ 
application data 
e My Compute\HKEY LOCAL MACHINE\ 


Software\Microsoft\WindowsNT\Current 
Version\Winlogon\Taskman 
(8) RMHZB.EXE and NEBIH.EXE is seen to perform 
following behavior: 
e Uses rootkit technologies to conceal its presence, 
interrogation or removal. 
Found on infected systems and resists interrogation 
by security products. 
e Has code inserted into its Virtual Memory space by 
other programs. 
Writes to another Process’s Virtual Memory (Process 
Hijacking) 
Created as a Process on Disk. 
This Process deletes other processes from disk. 
e Crashing down the computer terminals arbitrarily. 


Currently, a number of digital tampering detection techniques 
are available [16]. Interestingly, for further confirmation 
when author contacted www.virscan.org on 25" September, 
2010 and submitted sample of NEBIH.EXE the result was 
astonishing. Out of 35 malware scanners existing on the 
website none was able to show that its’ a scareware or a 
malware. Everyone showed it as a clean file. However, on 27" 
September, 2010 when author again contacted the 
abovementioned website, the result was that 11% ie, 4 
out of 35 scanners found it as a malware’®. 


5. SCAREWARE: IMPLICATIONS 

Tracking digital footprints of scareware samples under study 
indicate that a Denial of Service (DoS) attack using the 
Universal Plug and Play (UPnP) NOTIFY directive can be 
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carried out by sending a malicious UDP packet to port 1900 
containing a Simple Service Directory Protocol (SSDP) 
advertisement. An attacker can force the Windows client to 
connect back to a specified IP address and pass on a 
specified Hypertext Transfer Protocol (HTTP) or Hypertext 
Transfer Protocol Secure (HTTPS) request. If the system that 
the victim is attempting to contact for the device description is 
configured to "echo" such requests, the system will enter an 
infinite download loop that will quickly consume the system's 
resources and cause it to crash. It may be mentioned that 
denial of service is considered as one of the most difficult 
attacks to detect [18]. 


Additionally, a distributed denial of service (DDoS) attack 
using the UPnP NOTIFY directive can also be launched. It 
is similar to the first exploit, except the attacker sends the 
SSDP announcement to broadcast addresses and multicast. 
Multiple machines reply to the IP address to obtain the device 
description performing a DDoS attack against the system. This 
was seen when author tested the scareware in a networking 
environment where multiple systems are connected to the 
Internet. 


CONCLUDING REMARKS 

Malware is widely available on the internet for anyone wanting 
to cause mischief, theft, espionage or cyber-crime. The 
majority of internet users worldwide have fallen victim and 
they feel Incredibly powerless against faceless cyber criminals. 
The fundemental issue is that there is a law enforcement model 
that’s geographically based, but there’s no geography on the 
internet. An in-depth data analysis of crimes of persuasion 
Including the case study under reference demonstrates that 
many of the popular cyber-crimes are related to data theft and 
identity theft and display a judicious mix of cyber hypnotism 
and malwares. Such crimes can be handled, to a large extent, 
with constant vigilance at individual level In combination with 
safe security practices including deployment of malware threat 
mitigation controls. On the contrary, sophisticated cyber-attack 
In the form of scareware demands a better and more 
coordinated strategy on national level, which is required to be 
implemented by developing a suitable corporate defense plan 
including involvement of cyber vigilantes to ensure stronger 
cyber security. It is evident from present study that in such a 
scenario, when the branded and reputed anti-virus/anti-spyware 
vendors are unable to detect such well-crafted and encrypted 
malware planted by the clever but malicious entrepreneurs, the 
common man is left with no choice. Neither the local law 
enforcement agencies are equipped to cater the victim’s need, 
nor the government with thelr policies. Furthermore, the 
scareware scam is hard for police or other law enforcement 
agencies to investigate because the Individual sums of money 
involved are minuscule. Nonetheless, these cyber criminals 
strike time and again. Hence, the author advocates that to 
discourage criminals and to instil! faith in the digital medium at 
large, there is an urgent need to coordinate all cyber vigilantes. 
Undoubtedly, proactive security is need of hour wherein the 


central government may act as a regulatory authority for these 
cyber vigilantes’ supposedly high tech cyber security experts, 
who, in coordination, may prove valuable assets to 
safeguard the national economy in an unsafe cyberspace and 
to disseminate knowledge and information related to it. 
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ABSTRACT 

A typical Geographic Information System(GIS) is information 
Jystem that Integrates, stores, edits, analyzes, shares and 
displays geographic information for effective decision making. 
The focus here is to refine the storing and retrieving 
capabilities of any GIS. GIS application have a very high 
performance and scalability requirement, such as query 
response time of less than 3 seconds, 120000 customer sessions 
per hour and 100000 data addition/updates per day. Also an 
ideal GIS application always deal with high concurrent load, 
frequent database access for mostly read only data, and non- 
linear growth of mostly read only data over period of time. 
These all are the factors which lead to performance impact in 
the application. This research proceeds to understand how the 
In-Memory Data-Grid solution is better than other solutions 
and how can it be leveraged to implement a very high 
performing and highly scalable GIS applications. 


KEYWORDS 
In-memory data grid, Cache memory, Geographic Information 
system (GIS), Distributed cache 


1. INTRODUCTION 

Geographic Information system, commonly known as GIS is a 
computer system capable of capturing, storing, analyzing, and 
displaying geographically referenced information, that is, data 
identified according to location. Practitioners also define a GIS 
as including the procedures, operating personnel, and spatial 
data that go into the system . 

A GIS application[7] requires low response time, very high 
throughput, predictable scalability, continuous availability and 
information reliability which can be provided by In-Memory 
Data Grid. 

In-Memory Data Grid is a Data Grid that stores the information 
in memory in order to achieve very high performance, and uses 
redundancy - by keeping copies of that information 
synchronized across multiple servers in order to ensure the 
resiliency of the system and the availability of the data in the 
event of server failure[5]. 

Over the last few years, In-Memory Data Grids have become 
an increasingly popular way to solve many of the problems 
related to performance and scalability, while improving 
availability of the system at the same time. In-Memory Data 
Grid allows eliminating single points of failure and single 
points of bottleneck in the application by distributing the 
application's objects and related processing across multiple 
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physical servers. 

One of the easiest way to improve application’s performance is 
to bring data closer to the application, and keep it in a format 
that the application can consume more easily. 

Most enterprise applications are written in one of the object- 
oriented languages, such as Java or C#, while most data is 
stored in relational databases, such as Oracle, MySql! or SQL 
Server, This means that in order to use the data, the application 
needs to load it from the database and convert it into objects. 
Because of the impedance mismatch between tabular data in 
the database and objects in memory, this conversion process is 
not always simple and introduces some overhead, even when 
sophisticated O-R mapping tools, such as Hibernate or Eclipse 
Link are used. 

Caching objects in the application tier minimizes this 
performance overhead by avoiding un-necessary trips to the 
database and data conversion. This is why all production- 
quality O-R mapping tools cache objects internally and short- 
circuits object lookups by returning cached instances instead, 
whenever possible. 


2, PROBLEM STATEMENT 

2.1 INTRODUCTION TO THE PROBLEM 

Customer expectations from GIS systems have evolved 
significantly over a period of time [4]. Today customers are 
expecting better and faster online experience. 

Several architectures are proposed to retrieve necessary, 
interested and effective information efficiently and at the same 
time provide scalable platform for GIS application. However, 
the results of these architectures generally become 
unsatisfactory and prone to performance loss over the period of 
time. As soon as the customer base increases, the performance 
starts retarding. 


3. PROPOSED SYSTEM 

The proposed system is trying to inculcate the technology 
called distributed cache in a GIS application. This technology 
will not only boost performance of application but will also 
provide many more features to it. The first step in our paper is a 
strong research base of prevalent architectures and secondly an 
in-depth study of distributed cache technology . After the 
research we will try to prove our concept through a small proof 
of concept. 

If we are able to incorporate distributed cache in an GIS 
application the following feature would be achieved 

]. Low response time 


— - ot - = p 
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2. High throughput 

3. Eliminate bottlenecks 
4. Predictable scalability 
5. Continuous availability 
6. Failover support 

7. Information Reliability 


4. LITERATURE REVIEW 
Simple database retrieval architecture is still the back bone for 
most of the complex architectures in use today [8]. GIS 
application generally contains a program running on server, and 
is connected to a database. Numbers of users are connected to 
this program to query, update, delete or add different items. 
Initially, application was deployed on a server which originally 
supports 5000(say) users at a time, which means that at a time 
5000 users could connect to the server. No matter, how 
powerful a server was, for sure it would have some limit on 
number of users it could support, and therefore as an example 
here we have assumed that server could support 5000 users at 
a time. 
There is further limit on number of users, whose requests 
required access to database that could be processed 
simultaneously. The reason behind this was that, connections 
created to database were generally heavy, as many connections 
to database at the same time were not feasible. To efficiently 
use connections to access to database, developers generally 
used connection pools, and set a limit on number of 
connections that could be active at a time. Other then 
performance issues the other issues regarding the architecture 
were: 

e POF, which stands for single point of failures. In the 
architecture there were three single points of failures: 
application, database and server. In case either database 
crashes or server crashes or application crashes, complete 
application would be down and no one would be able to 
access the application or use application. 

e Shared resources were always performance bottlenecks and 
greater the number of connections/users a shared resource 
would have more will be the affect on performance. 
Whereas in the previous architecture, database was a shared 
resource, which could not support large amount of users at 
same time. 

e Another reason of low performance with the basic 
architecture was the step required to convert data stored in 
database to application object, when user queries for data 
stored in database, and step required reading application 
object to store data in database. 

To take care of number of users supported by application, load 

balancer was introduced [2]. Load balancer’s responsibility is 

to distribute the load efficiently among different 
servers/applications capable of process the request. In this 
architecture load balancer application is run on one system and 

GIS application is deployed and run on more than one server 

which is further connects to single database. Load balancer 

forwards the user requests to any of the server configured with 
load balancer based on the load of the server. The use of load 
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balancer tremendously increased the number of users that could 
simultaneously connect to application, as number of servers 
running the application was increased. But there were still large 
performance issues with the architecture [9]. 

Still, there is limit on number of users whose requests required 
access to the database, the reason being limit on number of 
connections that could be made simultaneously to the database. 

SPOF still existed. Though server running the application was 
no longer, single point of failure, as there were more servers 
which were present, which would be able to keep application 
running even if any server or application running on any server 
crashes. This would remain transparent to users, as users were 
no longer interacting with the server hosting the application, 
but users were interacting with the load balancer.:When load 
balancer would get news of one of the server being down, it 
would then exclude that server from its list of active servers 
and stop delegating any of the user requests to that specific 
server. But database was still single point of failure, as we were 
using single database, and if that database would crash, the 
application would fail. 

Cris J. Holdorph[3] gave an approach to work with distributed 
database instead of single database. In this scheme, it was 
considered that each server which was connected to a load 
balancer was having its own database. 

Though, number of users which could be supported now 
increased, compared to above discussed schemes, but this 
scheme would require another extra process to replicate the 
data stored in one database to other databases. This was 
required to take care of scenario, when user requests were sent 
to different database. The result sent back should be consistent 
and independent of data stored on database. 

Jim Handy[6] defined a scheme in which multiple servers 
were connected to single cache, which are further connected to 
the database. 

The number of connections that could be made increased 
(though this number depends on the server on which cache is 
hosted). Also the read queries would be much faster, and 
performance of write queries to the database would be 
improved if the updates were done in cache synchronously, and 
asynchronously saved in database by some other process. 
But there were still some disadvantages related to this scheme 
like cache and database were still single point of failure, if any 
of it crashed, application would not be available. Data-intensive 
queries would run on complete data in cache, which was not 
very efficient. 

In recent past there was a concept of In-Memory Data Grid and 
related products which have become famous, which could be 
used to improve performance of applications which are highly 
affected by database operations and mostly read only 
operations [8]. In GIS applications most of the requests are 
related to read-only requests which require reading something 
from database. Most of the users request sent to server are read- 
only request and insert/update command is used only when new 
point is located. 

Paul Colmer[5] described the features provided by In-memory 
Data grid, which makes it a good choice for GIS application. 
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An in-Memory Data Grid achieves low response time for data 
access by keeping the information in-memory and in the 
application object form, and by sharing that information across 
multiple servers. In other words, applications may be able to 
access the information that they require without any data 
transformation step . 

Performance is further improved by coalesces multiple changes 
to a single application object and batches multiple modified 
application objects into a single database transaction, meaning 
that a hundred different changes to each of a hundred different 
application objects could be persisted to a database in a single, 
large and thus highly efficient transaction[{10]. 

Arindam Chakravorty[1] discussed various topologies in which 
cache could be used to overcome the ‘limitations of above 
schemes. In-Memory Data Grid supports three types of caches. 
These are Distributed, Near and Replicated cache topology. 
Distributed cache is one in which each node in the server 
contains a unique set of application data in the cache. To scale 
the capacity of cache, increase the nodes in the cluster. Any 
type of cache will involve serialization /de-serialization and 
network transfers for application data read and write access in 
the cache. Distributed cache is best when the applications 
requires heavy volume of read and write application data. 
Distributed Cache architecture is shown in Figure 1. 





Figure 1: “Distributed Cache Architecture” 


Near cache is each client node containing small amount of data 
in the local cache and larger amount of data in the distributed 
cache and these caches are synchronized with each other. There 
ig some overhead involved with synchronizing the caches. 

In Replicated cache each node in the cluster will contain all the 
application data in the cache. Replicated cache is best when 
application requires less application data and highly read 
access from cache. 


š, GRID CLUSTER ARCHITECTURE 

5.1 GRID CLUSTER ARCHITECTURE \ 
In-Memory Data Grids[8] are built on a fully clustered 
architecture. Grid is based on a peer-to-peer clustering 
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protocol, in which servers are capable of: 

e Speaking to Everyone: When a party enters the conference 
room, it is able to speak to all other parties in a conference 
room. 

e Listening: Each party present in the conference room can 
hear messages that are intended for everyone, as well as 
messages that are intended for that particular party. 

e Discovery: Parties can only communicate by speaking and 
listening; there are no other senses. Using only these means, 
the parties must determine exactly who is in the conference 
room at any given time, and parties must detect when new 
parties enter the conference room. 

e Working Groups and Private Conversations: Although a 
party can talk to everyone, once a party is introduced to the 
other parties in the conference room (i.e. once discovery has 
completed), the party can communicate directly to any set 
of parties, or directly to an individual party. 

e Death Detection: Parties in the conference room must 
quickly detect when parties leave the conference room — or 
die. 

Using the conference room model provides the following 
benefits: 

e There is no configuration required to add members to a ‘ 
cluster, Any program running grid application when starts 
will automatically join the cluster and be able to access the 
caches and other services provided by the cluster. When a 
program joins the cluster, it is called a cluster node, or 
alternatively, a cluster member. 

e Since all cluster members are known, it is possible to 
provide redundancy within the cluster, such that the death 
of any one node does not cause any data to be lost. 

e Since the death or departure of a cluster member is 
automatically and quickly detected, failover occurs very 
rapidly, and more importantly, it occurs transparently, 
which means that the application does not have to do any 
extra work to handle failover. 


e Since all cluster members are known, it 1s possible to load 
balance responsibilities across the cluster. Grid does this 
automatically by distributing the load evenly across cluster. 
Load balancing automatically occurs to respond to new 
members joining the cluster, or existing members leaving 
the cluster. 


6. READ-THROUGH CACHING 
When an application asks the cache for an entry, for example 
the key X, and X is not already in the cache, data grid will 
automatically delegate to the cache-store which is responsible 
for loading data into cache, and this cache-store will now load 
X from the underlying datasource. 

If X exists in the datasource, the cache-store will load it, return 
it to data grid, which is then placed in the cache for future use 
and also data X is returned to the application code that 
requested it. This is called Read-Through caching. 
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7. WRITE-THROUGH CACHING 

Coherence can handle updates to the datasource in two distinct 
ways, the first being Write-Through[2]. 

In this case, when the application updates a piece of data in the 
cache the operation will not complete (i.e. the put will not 
return) until data {s also persisted to the underlying datasource. 
This does not improve write performance at all, since the user 
is still dealing with the latency of the write to the 
datasource] 10]. 


8. REFRESH-AHEAD CACHING 

In the Reftesh-Ahead scenario, Coherence allows a developer 
to configure the cache to automatically and asynchronously 
reload (refresh) any recently accessed cache entry from the 
cache loader prior to its expiration. 

The result is that once a frequently accessed entry has entered 
the cache, the application will not feel the impact of a read 
against a potentially slow cache store when the entry is 
reloaded due to expiration. The refresh-ahead time is 
configured as a percentage of the entry's expiration time; for 
instance, if specified as 0.75, an entry with a one minute 
expiration time that is accessed within fifteen seconds of its 
expiration will be scheduled for an asynchronous reload from 
the cache store. 


9. WRITE BEHIND CACHING 

In the Write-Behind scenario, modified cache entries are 
asynchronously written to the datasource after a configurable 
delay, whether after 10 seconds, 20 minutes, a day or even a 
week or longer. 

For Write-Behind caching, grid generally maintains a write- 
behind queue or any data structure which stores the data that 
needs to be updated in the datasource. When the application 
updates X in the cache, X is added to the write-behind queue (if 


it isn't there already; otherwise, it is replaced), and after the ` 


specified write-behind delay data grid service will update the 

underlying datasource with the latest state of X. 

Note that the write-behind delay is relative to the first of a 

series of modifications — in other words, the data in the 

datasource will never lag behind the cache by more than the 
write-behind delay. 

The result is a "read-once and write at a configurable Interval" 

(i.e. much less often) scenario. There are four main benefits to 

this type of architecture: 

e The application improves in performance, because the user 
does not have to walt for data to be written to the 
underlying datasource. 

+ The application experiences drastically reduced database 
load: Since the amount of both read and write operations is 
reduced, so is the database load. The reads are reduced by 
caching, as with any other caching approach. The writes - 
which are typically much more expensive operations - are 
often reduced because multiple changes to the same object 
within the write-behind interval are "coalesced" and only 
written once to the underlying datasource ("write- 
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coalescing"). Additionally, writes to multiple cache entries 
may be combined into a single database transaction. 

e The application is somewhat insulated from database 
failures: the Write-Behind feature can be configured in 
such a way that a write failure will result in the object 
being re-queued for write. If the data that the application is 
using is in the cache, the application can continue 
operation without the database being up. 

e Linear Scalability: For an application to handle more 
concurrent users you need only increase the number of 
nodes in the cluster; the effect on the database in terms of 
load can be tuned by increasing the write-behind interval. 


10. STATISTICS FOR COMPARISON 

10.1 DATABASE AND IN-MEMORY DATA GRID 
PERFORMANCE 

The comparison shows that when the data is stored 
conventionally in databases the processing speed is more as 
compared to when It is stored in cache. The results have been 
shown in figure 2 and figure 3. 


CONCLUSION 

An effective caching mechanism is the foundation of any 
distributed-computing architecture. The focus of this article 
was to understand the importance of caching in designing 
effective and efficient distributed architecture. In memory data 
grid method was finally implemented for the same. It has been 
observed that retrieval time of GIS application’s data saved 
using in memory data grid method is much less as compared to 
when the data is saved using the conventional database storage 
method. Thus, the use of distributed cache technology for 
spatial data storage will boost the performance of GIS 
application. 


FUTURE SCOPE 

Object relational mapping is a way to bridge the impedance 
mismatch between object-oriented programming (OOP) and 
relational database management systems (RDBMS). Many 
commercial and open-source ORM implementations are 
becoming an integral part of the contemporary distributed 
architecture. ORM technologies are becoming part of the 
mainstream application design, adding a level of abstraction. 
Implementing ORM-level cache will improve the performance 
of a distributed system. Therefore, this method can be used to 
improve the performance of the GIS application. In future, the 
digitized data required for GIS application can be stored 
using proposed Triangular Pyramid Framework for Enhanced 
object relational vector data model[3] under the distributed 
cache environment using In memory data grid for better 
results, 
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ABSTRACT 

Knowledge is awareness at higher level of abstraction [5, 6, 7]. 
It has tacit and explicit components. Concern lies in conversion 
of tacit knowledge to explicit form and its scientific 


management since it is identified presently as an economic - 


entity without diminishing return. Advancement in knowledge 
exercise contributes immensely towards socio-economic 
development of a community. This induces to explore ways to 
generate new scientific knowledge with evolving technologies 
Communication is one approach of many alternatives to evolve 
new scientific knowledge through inter and intra entity data & 
information exchange. Data communication using digital 
technology in recent years has attained ubiquitous dimension 
and its affect on knowledge generation has grown enormously 
[3, 19], resulting in need for enhanced attention. Present text is 
an attempt in similar stratum. 
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1. INTRODUCTION 

Civilization has surfed through agrarian, semi-industrialization, 
industrialization and advanced industrialization phases and 
arrived at information age where power has shifted from 
industrial to information and knowledge production and 
management leading to ‘Information Society’ concept wherein 
creation, distribution, and manipulation of information and 
knowledge has become the most significant economic and 
cultural activity [6,7]. 

Ahmad, Mazida [02] el al conveys, Nonaka and Takeuchi 
opines analyzing data from top Japanese industries that 
knowledge creation involves the processes of interaction and 
transaction of tacit and explicit knowledge between experts and 
novices that employ the processes of Socialization, 
Externalization, Combination, and Internalization (SECI). 
There are different interpretations of knowledge. To have 
knowledge one needs to add value to data or information. This 
brings in the role of management in the entire process. As 
knowledge is awareness at higher level of abstraction, to 
acquire it institutionally collective business goal is to be on 
focus, which in turn calls for a mechanism to facilitate intra and 
inter entity fluent flow of data communication [6, 7, 8]. 
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The knowledge abstraction as depicted in figure! is a process 
of awakening. The efficiency, with which it is attained, in 
terms of content and speed, varies amongst entities. In the 
figure, arrowed lines indicate flow of communication. Digital 
communication has become an important mode of knowledge 
exchange today. Comparatively, more knowledgeable 
institution or entity needs relatively lesser time and effort to 
traverse across information hierarchy, as depicted below in Fig- 
1 [6]. This requires skills for detailing, consolidation and 
communication. The speed and accuracy, with which an 
individual or an institution consolidates details to evolve 
knowledge, or traverse in reverse order from consolidation to 
detailing, reflects on its intelligence. The involved processes 
herein are both tangible and intangible in form. 

Knowledge has both tacit and explicit components to deal with, 
making its management very fiddly. Exchange of thoughts, 


- views and opinions through affective communication has 


always enhanced knowledge. The process of knowledge 
abstraction and its ramification at times is intangible and varies 
amongst entities or institutions. This makes tacit knowledge 
component bit inconceivable and its conversion to explicit form 
a challenge [4, 6]. 


Consolidation 








ransaction Support System 
Figure 1: “Institutional Intelligence System” 


To avoid knowledge loss or distortion its tacit part needs to be 
communicated widely for better storage and editing on 
cognitive space through discourse, debates, discussion, and 
deliberation. - Alternatively, its conversion to explicit form, in 
black and white or digital structure and communication over 
wide domain offers opportunity for future editing, up- 
gradation, retrieval and reuse. One may use either of the two 
techniques or both, in tandem. From time immemoriall, tools 
are being searched, used and improved upon to make it happen. 
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Today ICT, amongst other available tools, with its power, 
agility and ubiquity provides one of the best options. 

The issues on hand are to process data, information and 
knowledge and then share it on fast track. ICT works as a 
catalyst in this proceas, which has gone through evolution 
involving first, second, third, fourth and fifth generations of 
computing [1, 3]. This has made knowledge-generating 
practices to arrive on webl to web2 and then on to web3 
platform from standalone mode [21]. Exchange of thoughts and 
views over Internet worked as guiding force, framing public 
opinion in recent years, leading to notable changes in the social 
and socio-economic system. Recent upheaval in Middle East 
countries can be considered as case in point. Facebook, Twitter, 
Wiki etc are the platforms on Internet and World Wide Web 
where views and opinions are exchanged, edited and given 
shape by stake holders to evolve a collective perspective [26]. 
Discussions and deliberations on digita! social network like 
ReseachGate are immensely popular amongst serious thinkers 
with strong research orientation. On ResearchGate, with over 
million members onboard and publications, covering various 
subjects from ‘Mathematics’ to ‘Literature’, knowledge gets 
generateds reviewed and updated online. This makes instant 
conversion of tacit knowledge to its explicit form possible. 
However, to be able to use these platforms effectively one 
needs to possess necessary ICT tools like a computer system 
and means to hook it on to a strong digital network [12]. 


Tacit Knowledge Store Explicit Knowledge Store 


a= Print/Digital Store 





Figure 2: Knowledge Stores 


2. KNOWLEDGE OBJECT WITH ICT AS AN 
ENABLER 

From an abstract entity “Knowledge” in this era has evolved as 
an object with material and economic value, a fact recognized 
by international! institutions. World Bank derived Knowledge 
and Knowl Economic Index (KI & KED of counties 
considering “83 factors affecting their respective socio- 
economic conditions and ranked them according to the order 


during 2008 and 2009 [11]. According to these reports ICT is 
increasingly affecting socio-economic and cultural exercises 
leading to advanced knowledge. 

IBM and Economic Intelligence Unit (EIU) are regularly 
publishing data on ICT preparedness of countries in the world 
and related ranking since year 2001 to 2009 [10]. 

In the year 2010, in keeping with changing scenario, it 
published digital economy raking of member countries [10, 
15]. Recent report of International Telecommunication Union 
(ITU) [16, 17] of 2010 and 2011 on measuring information 
society concedes the role of ICT in enhancing socio-economic 
growth. It observes that if applied appropriately, ICT can be 
development enabler, critical oui attempting to 
transform itself as a knowledge soci 

Evolving computing and communicating techniques have 
affected Human-Computer relationship at various stages, which 
opened new avenues for knowledge generation. In the 
following section these aspects are briefly touched upon. 


3. COMPUTING 

The first generation computers of 1940-56 era used vacuum 
tubes for circuitry and magnetic drum for memory. 
Programming was done using machine language. Transistors 
replaced vacuum tubes in second-generation computers during 
1956-63. For programming, Assembly Language was used on 
such computers. High-level languages like FORTRAN and 
COBOL were developed during this time to make usages 
easier. Miniaturized transistors, called semiconductors, which 
were placed on silicon chips ushered-era of third generation 
computers during 1964-71. Use of semiconductors helped in 
radical enhancement of computing speed. Forth generation 
computers came into existence from 1971 with the arrival of 
microprocessors and are still in use, which also brought in GUI 
features, mouse and handheld devices [24]. Evolvement of 
various technical standards of computing enabled collaboration 
between technologies, pushing the growth faster. Fifth 
generation computers involving quantum computing and 
nanotechnology are in course of evolution. These efforts 
Stepped up speed, efficiency and volume of information 
processing and at the same time elevated the quality of the 
process involved in information dispensation [19, 24]. 


4. COMMUNICATING 

Evolution of Internet started in the 1950s and 1960s along with 
the development of computers. Initially this was to facilitate 
point-to-point communication between mainframe computers 
and their access points or nodes or terminals. Later it expanded 
to aid connections between computers leading to early research 
into packet switching, During 1970 Donald Davies developed a 
packet switched network called Mark I to support NPL 
(National Physical Laboratory). This was later, improved to 
Mark II in 1973 and it remained in operation till 1983. ‘Larry 
Roberts’ of ‘Advanced Research Project Agency’ in the United 
States took ahead the technology to ARPANET, which later 
evolved to INTERNET. In 1982 Internet Protocol Suite TCP/IP 
was standardized and concept of World Wide Network over 
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TCP/IP came into existence. Berners Lee evolved world 
information medium during 1990. He built all necessary tools 
for working on web, like Hyper Text Transfer Protocol 
(HTTP), Hyper Text Mark-up Language (HTML) and the first 
web browser (World Wide Web). At this instance during 1980- 
1990 commercial Internet Service Providers (ISPs) emerged 
(22, 23]. 

Efforts on standardization of computing and communicating 
practices increasingly become momentous from this occasion 
allowing diffusion of computer and communication technology 
in to all aspects of human life ranging from culture to 
commerce through fast exchange of data, information and 
message [15, 19 22]. 

Mark Weiser conceived the phenomena of existence of 
computing in every aspect of human life without any conscious 
reference and coined the word ‘ubiquitous computing '. In one 
of his articles in “Scientific American” during 1991 he 
expressed that all profound technologies will be very much in 
common place and be taken for granted so much so that all will 
get oblivion to their existence. Later, ubiquitous computing and 
related fields like wearable computing and augmented reality, 
have become one of the major emerging areas of HCI research 
[28, 29]. 


5. PARADIGM SHIFT IN HUMAN VS COMPUTER 
RELATIONS 
Electronic computing with Mainframe System had multiple 
users sharing centralized computing facility giving ‘one 
computer to many user’ relationship. Developments of 
microprocessors enabled creation of personnel computers 
leading to ‘one computer to one user’ relationship where each 
user possessed one computer to execute specific personal tasks. 
Evolution of TCP/IP protocol standard enabled to connect 
personal computers over digital network helping real time 
information sharing. The paradigm shifted further at the advent 
of Internet and World Wide Web, which allowed 
intefconnection of computers across the world over Internet 
leading to ‘one user many computer’ environment. Evolution 
of standards and protocols contributed immensely in these 
efforts. 
Miniaturization of microprocessors enabled embedded 
computing ability on various devices of day-to-day use leading 
to ubiquitous computing paradigm [03, 28, 29}. This made 
computers to have invaded in every aspect of life. To take 
things further ahead, virtualization and cloud computing 
emerged with offerings like Software (SaaS), Infrastructure 
(IaaS) and Platform (PaaS) as service to shape paradigm shift 
of computing to forth generation and beyond [8,12]. These 
facts encouraged computer and Internet usages reducing 
individual resource liability. 


6. KNOWLEDGE GENERATION ON VIRTUAL 
PLANE 

‘Data communication’ and "Text Transfer protocols like 

(TCP/IP) and (HTTP) paved path for INTERNET and World 

Wide Web (www) leading to Webl, Web2 and Web3 paradigm 


on which knowledge generation and exchange has become 
faster and simpler [21]. Standards and Protocols combined with 
advanced computing and networking features extended human 
reach and capability to express over virtual plane beyond 
defined spatial boundaries and media restrictions [01, 18, 26]. 
Web! paradigm allowed us to create simple black and white 
static pages while Web2 allowed to be colorful with dynamic 
content, Web3 environment provides interactive web pages on 
which a piece of text can be created, edited or commented on 
instantly to have tacit knowledge converted in explicit form 
[21]. Computing capability and techniques over Web 2.0 & 3.0 
are emerging as a very effective tool to process data and retain 
information to create Knowledge instantly online [08, 18, 19, 
21]. This comes handy in registering tacit knowledge in explicit 
form too. 

To take things further ahead, Virtualization and cloud 
computing techniques are attempting to make advanced 
computing resources ubiquitously available without involving 
end users in the complexities of information storage and 
retrieval process. Leading ICT institutions and service 
providers like Oracle, IBM, Microsoft, Goggle etc. are coming 
up with virtualization and cloud computing options. Wikis, 
Blogs, Social Networking over digital platform etc. are modern 
day’s podium for knowledge collaboration, tacit knowledge 
registration and up-gradation [12,18]. 


7. VIRTUAL ENTERPRISING 

To remain competitive we need to identify what helps us best 
to traverse on the path of wisdom as shown in fig -1. Hendricks 
[14] opines that Information and Communication Technology 
(ICT) with Hardware equipments and software solutions can 
enhance knowledge sharing by lowering temporal and spatial 
barriers between knowledge workers, thus improving access to 
information on knowledge. He throws light on differential 
effects of ICT on the motivation’ for knowledge sharing in 
different settings. It is also observed that most successful 
companies are those who use their intangible assets faster and 
better. 

Christian Kreutz [27] the founder of Crisscrossed indicated that 
Tagging (Marking), Social Book Marking (Networking), 
Blogging (Story Telling), Wikis the white board and RSS Feed 
are five tools for present day's knowledge sharing. For which 
one simply need to possess computing systems and access to 
Internet. This leads to the conviction that the overall structure 
for handling data and information presently is capable of 
accommodating more abstract inputs in business decision 
making process to enhance cognitive level at one end and its 
quick ramification leading to detail functional directions for 
effective business process execution on the other. This leads to 
the assurance that ICT facilitates ‘knowledge production’ with 
reasonable ease and effect can be seen in enterprise level 
conceptualization. ' 

Enterprise 2' is the new buzzword where concept of social 
business is being consolidated. Enterprise 2.0 uses web 2.0 
within organization to enhance collaboration leading to 
streamlining of business processes. "Enterprise 2.0" concept 
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was coined by Harvard Business School professor Andrew 
McAfee in 2006 to portray how the Web 2.0 “technologies 
could be used in organization's intranet and extranets. It is 
obvious now that in this era of collaboration, connectivity over 
data communication network holds the key [19, 21, 25, 26], 
This necessitates assessing ICT strength in general and 
communication network in terms of broad band connectivity in 
particular to comprehend state of knowledge society. Today E- 
Readiness ranks and Broad band connectivity statistics 
indicates accessibility to ICT, one of the modern platforms of 
knowledge exercise. 


8. ANALYSIS 

At this instance assessment of Knowledge and Communication 
capability appears imperative. Country wise measures to assess 
Knowledge and Communication strength are brought in focus 
to analyze and comment on the context in the following 
section. 


8.1 CONNECTIVITY 

The report of International Telecommunication Union (ITU) 
[16, 17] of 2010 and 2011 on measuring information society, 
the concept as discussed in section 3 of present text, finds the 
touchable role of ICT in enhancing economic growth and 
socio-economic development 

It observes that if applied appropriately, ICT can be 
development enabler, critical to attempting to transform 
countries as a knowledge society and the concept is pivotal to 
the measure IDI (ICT Development Index). According to the 
report, apart from productivity, ICT impacts other economic 
and socio-economic factors like digital inclusion, access to 
knowledge and information, acquisition of skills increasingly in 
demanded in a range of occupations and even in school 
performance. It considers availability of infrastructure, access 
and effective use, with skill and intensity forms the core 
context, which has an tmpact on knowledge society. ITU 
member states are considering ICT demand data as an essential 
input to gauge ICT impact. According to the report ICT is 
assisting in creating knowledge on many sectors like 
Agriculture, health, educations, socioeconomic growth etc. In 
digital communication context it has observed significant 
increase in use of both fixed and mobile broadband services in 
both developing and developed world in international scenario. 
It underlines growth in developed nation has been at a higher 
rate than developed nations though its usage remains non- 
measurable. The introduction of high-speed mobile Internet 
access in an increasing number of countries could further boost 
the number of Internet users, especially in the developing 
world. It finds the number of mobile broadband subscriptions 
surpassed the number of fixed broadband subscribers in 2008 
indicating shift in usage pattern. The number of mobile 
broadband subscriptions refers to subscriptions that have access 
to a high-speed mobile network. The report finds fixed 
broadband access is still largely confined to Internet users in 
developed countries. In the year 2009 broadband penetration 
stands at 23.3 per cent in developed countries compared to only 


3.5 per cent in developing countries. The gap between 
developed and developing countries appears even wider for 
mobile broadband penetration, with 38.7 and 3.0 per cent 
penetration, respectively. The report observes the mobile 
broadband market in developed countries is dominated by 
Europe, accounting for 220 million mobile broadband 
subscriptions (over one third of world’s total) [17]. 
Encouragingly the report observed ICT services have become 
more affordable worldwide and its usage has increased even in 
the era of economic downturn. Of the ICT services, fixed 
broadband service showed the largest price fall. This is 
followed by mobile cellular and fixed telephone services. The 
report observed that countries with the highest broadband 
prices are all ranked relatively low in the ICT development 
index (IDI) [16, 17] putting forward the view that the services 
affordability is essential to build an inclusive information 
society. 

It further highlights the fact that Internet plays at home improve 
educational achievements and accesses role of positive catalyst 
in socio-economic developments. However, the broadband 
price gap between developing and developed nations remains 
enormous and least affordable service in developing world. The 
gap continues incase of mobile cellular and Internet use though 
its usages is in increase in developing world. Since these are 
the platforms used for accessing knowledge generating tools of 
the day, a look into the related statistics may be revealing 
[16,17]. 


8.2 KNOWLEDGE ASSESSMENT 

Knowledge Assessment Methodology (KAM) evolved by 
World Bank and its indexes are identified to realize inherent 
benefit in Knowledge Exercise. The KAM was designed by 
the Knowledge for Development (K4D) program to assess a 
country’s preparedness to compete in the knowledge economy 
using 83 (eighty-three) structural and qualitative variables. 

The KAM Knowledge Indexes comprise of Knowledge 
Economy Index (KEI) and Knowledge Index (KI). The 
Knowledge Economy Index (KEI) considers whether or not the 
environment is conducive for knowledge to be used effectively 
for economic development. (KI) measures a country's ability 
to generate, adopt and diffuse knowledge. This helps planners 
of a country to have an opportunity to look into the state of 
national knowledge exercise, responsible for defining future 
growth framework and align planning process to reap best 
benefits out of it [9, 10, 11]. Methodologically, the KI is the 
simple average of the normalized performance scores of a 
country or region on the key variables in three Knowledge 
Economy pillars — education and human resources, the 
innovation system and Information and communication 
technology (ICT). ICT score, as registered by World Bank 
reflects on its preparedness of a country. Of the two knowledge 
indices in the present context, KI is taken into consideration, 
which is an indication of overall potential of knowledge 
development in a given country. 
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8.3 ITU References 

The IDI (CT Dévelopment Index) scores registered by ITU 
(International Telecommunication Union) is culmination of 
ICT preparedness of a country in terms of infrastructure, ICT 
use (intensity) and ICT Capability (Skill). IDI reflects on 
nation’s preparedness towards evolving as Information Society. 


8.4 Collective Perspective 

To allow better comprehension in the following table 
Knowledge Index scores of top ten KEI ranked countries 
recorded by World Bank for the year 2009 are charted along 
with corresponding ICT score. IDI scores of 2008 and 2010, as 
indicated by ITU of the related countries are also reflected to 
depict trend of IDI. Scores of India along with other countries 
with neighboring scores are tabled to present a window view of 
the scenario. This is likely to present a wider perspective. 


KEI | Country KI ICT | IDI IDI 
Ran Year | Yea | Year | Year 
k 2009 r 2008 | 2010 
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Tablel 

*KI, ICT, IDI Scores on 0-10 scale; 

**KI1, KEI, ICT Data Source - World Bank; IDI Data Source- 
ITU 

It has been discussed earlier that World Bank evolved 
Knowledge Assessment Methodology (KAM) by 
the Knowledge for Development (K4D) program and indexes 
therein are identified to realize inherent benefit in Knowledge 
Exercise. It was designed to assess a country’s preparedness to 
compete in the knowledge economy, using 83 (eighty-three) 





structural and qualitative variables. It is important to note that 
according to the study the Knowledge Index (KI) measures a 
country's ability to generate, adopt and diffuse knowledge; and 
benchmarks one country’s position compared to others in the 
global knowledge economy whereas the Knowledge Economy 
Index (KEI) considers whether or not the environment is 
conducive for knowledge to be used effectively for economic 
development in the concerned country. 

According to the studies in K4D program the Knowledge Index 
(KI) is the average of the rankings of the performance of a 
country or region in three areas of the so-called Knowledge 
Economy, namely, education, innovation and information and 
communications technology (ICT). Thus KI of a country 
reflects on state of education, new knowledge creation in terms 
of research and development R&D, patent registration etc and 
state of ICT exercise therein. ICT in terms of preparedness, use 
intensity and capability also figures in ITU studies and shapes 
IDI scores. This makes study of ICT and IDI scores quite 
interesting. 

In the table-1, countries are ranked according to KEI of the 
year 2009, In that context position of ‘Denmark’ comes forth at 
the top vis-à-vis other countries in the world as far as 
effectiveness of its environment for knowledge usages is 
concerned though it’s KI score is less than Sweden This 
indicates that Sweden was more capable than Denmark to 
generate, adopt and diffuse knowledge though its environment 
was not as conducive in its usage in the year 2009. 

[t appears IDI scores are on increase for advanced knowledge 
generating countries, which indicates infrastructure, skill and 
intensity is on increase in these countries. Juxtaposing ICT and 
IDI score gives a window view of two different class of 
assessment on ICT preparedness of a country Placing KI next 
to it helps to reflect on effect of these scores on Knowledge 
Index. As found in the study, India needs to cover more ground 
to figure in elite segment. According to the study India figures 
at 109 position followed by Guatemala, Nicaragua etc. In the 
same year the ranking continued up to the rank 146, which ts 
held by Haiti. In the following in Table-2, attempt has been 
made to present perspective of Indian subcontinent and China 
for the same year i.e. year 2009. 


KEI Country KEI KI ICF | IDI IDI 
Rank Year | Year | Year | Year | Year 
2009 2009 | 2009 | 2009 | 2008 | 2010 


| 81 | China | 4.47 






*KI, ICT, IDI Scores on 0-10 scale; 
**KI, KEI, ICT Data Source - World Bank; IDI Data Source- 
ITU 
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According the World Bank study knowledge exercise in Indian 
Subcontinent has spaces to cover up in terms of education, 
innovation and information and communications technology 
(ICT) along with infrastructure development. 

Recently World Bank has made data related to KI & KEI for 
the year 2012 available,and it is presented in Table-3. It is 
evident that there is sharp competitiveness amongst countries in 
moving ahead to make distinguishable in knowledge society. In 
the knowledge society first three positions are being held 
between Denmark, Sweden and Finland with relatively high 
ICT score. However it appears that capability to generate, 
adopt and diffuse knowledge, does not always enhance the 
capability to use it In Table-3 it is evident that though 
Netherlands possess same KI score of Finland, in KEI, it scores 
much less, which finally affects its KEI rank. 


Rank Year | Year Year 
2012 | 2012 2012 
43 : 


un 
wed 









1 | Sweden |943 |938 |949 | 
[2 | Finland |933 |922 |922 
3 | Denmark | 9.16 |900 |888 | 
6 | NewZealand | 8.97 
8 | Germany |890 |883 |917 | 
9 | Australia 8.88 
10 | Switzerland | 8.87 [8.65 _ |920 | 
O E E e 
108 | Indonesia 3.11 
109 | Honduras |3.08 |300 |324 
110 [India |306 |289 |190 | 
111 [Kenya |288 [291 1291 | 
112 | Syrian Arab | 2.77 | 3.01 
R nublic 
Table-3 
*K1, ICT Scores on 0-10 scale; 


**KI, KEI, ICT Data Source - World Bank; 


In this spirited segment Australia, New Zealand and Germany 
are new entrants, pushing USA, United Kingdom and Ireland 
out of top ten slots. Slide of countries like USA and UK out of 
top ten slots, indicate, the state of competitiveness in efforts to 
improve on state of education, new knowledge creation in 
terms of research and development (R&D), patent registration 
etc and state of ICT exercise in it along with efforts in 
conversion of the same in economic term with the 
improvement of overall environment. Micro and Macro 
planners at national level are required to note Indian slide by 
one more rank with reference to KI & KEI issues, in order to 
advance efforts to be part of knowledge society. 


9. CONCLUSION 

Human civilization has gone through various stages like 
Agrarian, Industrialization, Advanced Industrialization etc. and 
finally arrived at Information age where knowledge has been 
identified as an economic entity [5,6,7] as discussed before. An 
advanced Knowledge Society expected to provide opportunities 
for knowledge creation and its application for better economic 
edge. KI and KEI are the measures identified by World Bank, 
which help in accessing success of a country in efficient 
production and effective use of knowledge. These measures 
help to present a perspective, which need not be taken as 
absolute term, though it is irrefutable that the process helps in 
taking necessary steps towards knowledge advancement in 
society. 

In the present context looking at the table 1, 2, 3 it gets clear 
that countries with higher ICT and IDI have higher KI, which 
also vindicate observation made by ITU studies and leads to the 
conclusion that ICT preparedness affects Knowledge exercise 
positively leading to a knowledge society. Collectively these 
studies indicate that to realize higher ICT and knowledge 
generating capability academic environment, cost of 
connectivity and other ICT resources needs due attention along 
with improvement of infrastructure in a country. 


FUTURE SCOPE 

Technology changes at a fast pace and so does overall scenario, 
accordingly measures are also adjusted. Thus assessment of 
Knowledge Society needs to be a continuous process and needs 
to be realigned with changing scenario. 
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ABSTRACT 

image watermarking is considered a powerful tool for 
Copyright protection, Content authentication, Fingerprinting 
and for protecting intellectual property. We present in this 
paper a watermarking algorithm based on block wise changing 
magnitude of DFT domain. This algorithm can be used as an 
application for copyright protection. To provide multi-level 
securities we have first used best sel/-synchronizing T-codes to 
encode the watermark. The encoded watermark is then 
embedded into the cover image using a stego-key. We have 
analyzed our algorithm against noise such as Salt and Pepper, 
Gaussian and Speckle. 


KEYWORDS 
Watermark, DFT Composition, Image Processing 


1. INTRODUCTION : 

Watermarking is a branch of information hiding that talks 
about data embedding in the inconspicuous files or cover 
objects such as images, video, audio, graphics, texts or packet 
transmission in a perceptually transparent manner. Digital 
watermarking is an attempt to solve the growing concerns 
about proof of ownership, content authentication, copyright 
violation, temper proofing, illegal copying and distribution and 
issues such as fake currency. The basic attributes of 
watermarking techniques are Robustness, Security and 
Undetectability. 

There are three common steps of Watermarking techniques 
viz., 

1. Design of watermark, 

2. Watermark embedding and 

3. Watermark extraction. 

There are various domains of information hiding viz., spatial 
domain, transform domain and spread spectrum domain. The 
simplest spatial domain method of watermark embedding is 
changing the least significant bits (LSB’s) of the cover image, 
but it is not robust to addition of noise or lossy compression. 
Since the degradation in smoother regions of an image is more 
noticeable to the human visual system (HVS), it is preferable to 
hide watermark in noisy regions and edges of images. The 
transform domain based hiding techniques has not only the 
potential to achieve higher capacity than the spatial domain 
based techniques, they are also found to be more robust. 
Therefore, methods based on transform domain have got more 
attention than the spatial domain. One can embed watermark by 
changing the LSB’s in the block based transform domain or in 
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the global transform domain. A watermark embedding 
operation can be carried out in a transform domain, such as 
Discrete Fourier Transform (DFT), Discrete Cosine Transform 
(DCT), Discrete Wavelet Transform (DWT), Singular Value 
Decomposition (SVD) Transform, Karhunen-Loeve Transform 
(KLT) and discrete Hadamard Transform (DHT). 
This paper is about information hiding in still images. Most of 
the research on watermarking is focused on images. Apart from 
text, images have been used widely as cover objects for the 
purpose of information hiding as their digital representation 
provide high degree of redundancy. These techniques are 
independent of an image formats and hide data in more 
significant areas of the transformed image. The details of such 
different watermarking techniques can be found in [3, 4, 12, 15, 
17, 19, 20, and 21]. 

M. Barni et al [7] and R. Dugad [8] have shown DCT or DWT 
domain semi-blind watermarking schemes to be robust against 
a number of attacks. However, their method resulting in a 
weaker detection when a geometric attack (e.g., rotation, 
translation, and scaling) is tried due to the change in the 
location of the transform coefficients. Therefore, some 
researchers [2], [9], [11] have emphasized on DFT-based 
watermarking because of the properties of the DFT. The DFT 
of an image is generally complex valued and this leads to a 
magnitude and phase representation for the image. Most of the 
information about any typical image is contained in the phase 
and the DFT magnitude coefficients convey very little 
information about the image. Thus one would expect that good 
image compression techniques would give much higher 
importance to preserving the DFT phase than the DFT 
magnitude. 

Ridzon R and Levicky D [16] have discussed the robust 
watermarking techniques and proposed one robust digital 
image watermarking technique based on the discrete Fourier 
transform and log-polar mapping. 

V. Solachidis and [.Pitas [18] have presented an algorithm for 
rotation and scale invariant water -marking of digital images. 
An invisiblemark is embedded in magnitude of the DFT 
domain. The algorithm is shown to be robust to compression, 
filtering, cropping, translation arid rotation. 

M. Ramkumar et al [13] have observed that all major 
compression schemes such as JPEG, SPIHT and MPEG 
preserve the DFT magnitude coefficients as well as preserve 
the DFT phase. The other advantage for using the DFT 
magnitude domain for watermarking is lying in its property of 
translation- or shift-invariance. A cyclic translation of an image 
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in the spatial domain does not affect the DFT magnitude, and 
because of that, watermark embedding in the DFT magnitude 
domain remains translation-invariant. 

Farid Ahmad [1] has proposed a dual Fourier-Wavelet domain 
watermarking technique for authentication and identity 
verification. He has embedded a robust signature and hidden it 
in a mid-band wavelet subband using Fourier domain bit- 
embedding algorithm. His method shows the compression 
tolerance. 

In this paper, we present a watermarking algorithm based on 
DFT magnitude domain using a self-synchronized variable 
length codes, viz., T-codes for embedding the watermark. In 
section 2 we explain the proposed algorithm. The experimental 
results of the algorithm are present in section 3. In section 4, 
we conclude and give the suggestion on the future scope of this 


2. THE PROPOSED WATERMARKING TECHNIQUE 
We propose a watermarking technique of block wise changing 
magnitude of DFT coefficients. The cover image is divided into 
8x8 or 16x16 blocks and one bit of secret message (watermark) 
is embedded into each randomly selected DFT blocks. The 
maximum payload (capacity) of watermark is equal to the 
number of blocks constructed in the cover image. Moreover, 
the watermark (i.e., the hiding message) is imperceptible. The 
purpose of using Best T-codes in the embedding process has 
two-fold advantages. First is that we can have better embedding 
capacity and second is the inherent self-synchronizing property 
of T-codes. Ulrich [6] has shown that T-codes show the best 
synchronization performance amongst the most efficient 
variable length codes and require anything between 1.5 to 3 
characters to attain synchronization following a lock loss. 
Further, A.C.M. Fong et al [5] have shown that T-codes 
provide better performance for robustness against most 
common signal distortions. S.K.Muttoo and Sushil Kumar [10] 
have shown that T-codes give better results of imperceptibility 
(in PSNR) when they replace Huffman codes in the 
steganographic methods (jpeg-jsteg/Outguess). The steps of 
the embedding method are described in the figure 1. 


The Embedding algorithm is summarized as follows: 

Q Input the Cover image and watermark (i.¢., text) 

l. Divide the cover image into 16x16 (or 8x8) blocks and 
apply DFT to each block 

2 Enter watermark (i.e., text or message) 

3. Obtain the secret message, m, by encrypting the original 
message using best T-codes 

4. Let n = size (secret message) and nb= total number of 
blocks, 

5. Use PRNG to obtain a permutation of ‘nb’- random 
numbers, Say T, 

6. While (n <= nb) do 

For i= } to n do 


61 Select the random DFT block r, 
6.2 Embed m,secret message bit into r, as follows: 


fm =l’ 
Change the block r,'s magnitude by some 
amount such that 
It should be perceptible 
Else 
r, remains unchanged’ 
7. Output: Watermarked image. 












8x8 of 16x16 
hlocks 






Embedding one bit 
per DFT block 


Figure 1: “The block diagram for watermark embedding 
process” 


For extracting watermark, we compare each DFT block’s 
magnitude of watermark image with DFT block’s magnitude of 
original image. If they come out to be same, then bit embedded 
is ‘0’ otherwise it is ‘1’. The original message is then obtained 
by decrypting the extracting message using best T-codes. The 
extraction process is shown in the figure 2.2. 





Divide it into 8x8 


Divide into 8 X 8 
Or 16x16 blacks 


Extractin 
g one bit 
per block 





Cover 
Image Steg 
0- 
Figure 2: “The block diagram of the watermark extraction 
process” 
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3. EXPERIMENTAL RESULTS 

We have implemented our algorithm on Matlab 7.0 on the 
‘png’ and ‘tif images. The issues of imperceptibly, robustness 
and security are analyzed. 


3.1 IMPERCEPTIBLITY 

For imperceptibility, we used the PSNR as a measure of 
perceptibility. The summary of some of the results obtained are 
shown in the table 1 


' PSNR = 10 logjg (2552/ MSE) 
MSE*(I/N) 2 SY (xij — x’ij ) 7 





Table 1: “Image: ‘lena.png’; 
Size of image: 512 x 512 x 3" 


3.2 ROBUSTNESS 

We have analyzed our technique against Salt & Pepper, 

Gaussian and Speckle noise. Some of the results are 

summarized in table 2. 
Nolse density/ impercept 


Variance 


Salt and 
Pepper 


ibility 


24.508046 


0.0001 36.679711 YES 


Gussain | 0.0005 32.201980 YES 


31.795482 YES 


Table 2: “Image: ‘lena png’ ; n= 1023 ; 
PSNR{without noise}39.423371" 


Speckle 





CONCLUSION AND FUTURE SCOPE 
The algorithm proposed in this paper makes use of DFT 
magnitude domain for watermark embedding. Watermark can 


be embedded of capacity equal to the number of blocks created 
of cover image. Thus, one can have better embedding capacity. 
From the experimental results as shown above, we observe that 
the method is robust against adding noise such as Salt and 
Pepper, Gaussian and Speckle to the extent the image remains 
imperceptible. 

Our extraction algorithms need the original cover image to 
reveal the hidden text from stego image, i.e., our scheme is 
‘cover escrow scheme’. The other scheme known as ‘blind 
scheme’ that does not require the original cover image to detect 
the “hidden message. It is observed that traditional block 
transform coding of images may generate artifacts near block 
boundaries that degrade low bit rate coded images. The 
Wavelet transforms in the frequency domain techniques have 
been used because they make the process of imperceptible 
embedding more effective. Wavelet transform produces much 
less blocking artifacts than the DCT and they also perform well 
in image de-noising. Wavelets are found to be well adapted to 
point singularities but they have a problem with orientation 
selectivity. They are not efficient in representing the contours 
not horizontally or vertically. To eliminate the blocking effect 
new transforms such as ComTourletstransform (CTT) and 
Lapped transforms (LOT) have been investigated in the past. 
These transforms have not yet been explored fully in 
information hiding. A combination CTT-DWT is suggested to 
be a good candidate for new compression codec in the 
literature. 
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ABSTRACT 

This paper presents a homography based ground plane 
detection method, The method is developed as a part of stereo 
vision based obstacle detection technique for the visually 
impaired people. The method assumes the presence of a texture 
dominant ground plane in the lower portion of the scene, which 
is not severe restriction In a real world. SIFT algorithm is used 
to extract features in the stereo images. The extracted SIFT 
features are robustly matched by modal fitting using RANSAC. 
A sample of putative matches Lying in the lower portion of the 
image is selected. A fitness function is developed to select 
matches from this sample, which are used to estimate ground 
plane homography hypothesis. The ground plane homography 
hypothesis is used to classify the SIFT features as either 
belonging to ground plane or not. Image segmentation using 
mean shift and normalized cut is further used to filter the 
outliers and augment the ground plane. Experimental tests 
have been conducted to test the performance of the proposed 
approach, The tests indicate that the proposed approach has 
good classification rate and have operating distance range 
from 3 feet to 12 feet. 


KEYWORDS 
Ground Plane; SIFT; Electronic Travel Aid, Homography 


1. INTRODUCTION 

To get the perception of the environment around them, humans 
depends upon five senses- vision, hearing, smell, touch and 
taste. Among these, vision is undoubtedly the most dependable 
one. Most people cannot imagine what life would be, if they 
lose it, This is however, a hardcore reality for 45 million people 
worldwide, who are blind. World Health Organization (WHO) 
in the year 2010 has estimated that the worldwide count of 
Visually Impaired (VI) people is about 314 million and 45 
million amongst them are completely blind [1]. 

VI people experience serious difficulties in leading an 
independent life, due to reduced perception of the environment 
[2]. The most obvious problem faced by VI people is in 
navigating the unknown environments without bumping into 
unexpected obstacles. Thus, obstacle detection Ig one of the 
major problems that need to be solved to ensure safe navigation 
for VI people. 

The problem of obstacle detection may often be reduced to the 
problem of ground plane detection. With the ground plane 
detected, the other objects can be viewed as obstacles, if they 
protrude outside of the ground plane. In this paper, we have 
proposed a homography*based ground plane detection method. 
The developed method is a part of stereo vision based obstacle 
detection technique developed for VI people. The proposed 
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method assumes the presence of a texture dominant ground 
plane in the lower portion of the scene, which is not severe 
Testriction in real world. The SIFT matches lying in the lower 
portion of the image and selected by a fitness function are used 
to generate a ground plane homography hypothesis. The 
generated homography hypothesis is used to classify matched 
SIFT features as either belonging to ground plane or not. 

The paper is organized as follows: Section 2 presents an 
overview of related work. Section 3 explicates the theoretical 
background of homography. Section 4 presents the proposed 
approach in detail. The experimental results are given in Section 
5. Section 6 makes the concluding remarks. 


2. RELATED WORK 

In context of navigation for a visually impaired user, obstacle 
can be defined as “anything that stops the progression of the 
user and/or requires the modification of his/her posture.” For the 
past many years, the VI people have been relying greatly on the 
use of white cane during navigation. However, white cane has 
an inherent disadvantage. It cannot be used to obtain the 
information about the obstacles beyond its reach and hence 
cannot help the user in broad route planning. 

Since 1960's, extensive research has been carried out for 
developing electronic devices, known as Electronic Travel Aids 
(ETAs), to assist VJ people in autonomous navigation [3]. A 
number of ETAs that make use of radar, lidar and sonar 
technology [4-11] have been developed. However, their major 
disadvantages include: interference with the environment, 
difficult interpretation of the output signals, high power 
consumption, high acquisition price, poor angular resolution and 
incapability to detect small obstacles. 

Vision-based ETAs have seen tremendous development in the 
recent years, largely due to the availability of low-cost cameras 
and compact yet high performance processors that support 
image processing. Being passive in nature, vision based aids 
have low power consumption and do not interfere with the 
environment. These systems work in the direction of capturing 
the image of an environment and mapping the image into sound 
or vibratory pulses without undertaking any image processing 
efforts to provide the information of objects in the scene. In 
general, background fills more area in an image frame than the 
objects and hence, conversion from unprocessed images will 
lead to information overload, with background details masking 
primary mobility information. 

Automatic pre-processing to provide mobility data at a high 
level of abstraction, by eliminating detailed clutter but retaining 
essential mobility information, can alleviate the problem of 
information overload. Ground plane perception is the vital 
information for human mobility [12]. Gibson in [13] suggested 
that “the spatial character of the visual world is given not by 
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the objects in it, but by the ground and the horizon.” Molton 
[14] developed a stereo-based mobility aid for partially sighted 
people by estimating ground plane using disparity information. 
Many- approaches [15-25] for ground plane estimation for 
mobile robot and autonomous guided vehicle (AGV) 
navigation have been investigated by various researchers. 
These approaches rely mostly on the processing of different 
features attached to the ground planes: color [18, 19, 20], 
texture (lane markings) [18], disparity [14, 19, 20], v-disparity 
[21, 22], motion (optical flow [19, 23]), homography 
estimation [24, 25] 


3. THEORETICAL BACKGROUND 
There exist projective relationships between two viewpoints of a 
scene taken from a stereo rig. The corresponding points in 
stereo images, taken from uncalibrated cameras, are related by a 
fundamental matrix. If x and x’ are the homogenous image 
coordinates of the corresponding points {x++x’} in a stereo 
image pair, Fx describes an epipolar line on which the 
corresponding point x’ on the other image must lie. For each 
pair of corresponding points, the epipolar constraint is expressed 
as: 

x" Fx =0, (1) 
where F is a fundamental matrix. It is a3 by 3 matrix of rank 2 
with seven degrees of freedom, hence it can be recovered from 
7 point correspondences. 

If a set of points lie in a plane, and they are , imaged from two 
viewpoints, then the homogenous coordinates of the 
corresponding points {x,«+xi’} in the two images are related by 
a plane-to-plane projectivity or homography such that: 

Hix;’, (2) 
where H is a 3 by 3 matrix representing homography and À is a 
scalar. Since equation 2 is valid up to a scale factor, H has only 
eight degrees of freedom and it is normal practice to choose A 
such that the element hy in H is set to unity. To determine H, 
four corresponding non-degenerated coplanar points are 
required, since each point correspondence provides two 
independent constraints, thereby making H determination 
possible by standard linear methods. However in reality, with 
the data being non-perfect, more number of point 
correspondences should be used for the accurate estimation of 
H. 


4, PROPOSED APPROACH 
The steps involved in the proposed approach are described in 
the following sub-sections in detail: 


4.1 IMAGE PROCESSING 

The stereo images grabbed with low cost web cameras, arranged 
to form a stereo rig, often contain noise. The image noise can be 
eliminated by using Gauss filter. Contrast- Limited Adaptive 
Histogram Equalization (CLAHE) algorithm [26] is then 
applied to enhance the partial contrast and selectively highlight 
the obvious features, so that the resultant images are more 
conducive to feature extraction. As opposed to histogram 
equalization, CLAHE operates on small regions in the image. It 
enhances the contrast of each region and eliminates the 


artificially induced boundaries in the neighboring regions by 
using bilinear interpolation. 
Figure 1 shows the result of image preprocessing stage. 





(f) Histogram of (c) 
Figure 1: “The results of preprocessing stage” 


(d) Histogram of (a) (e) Histogram of (b) 


4.2 FEATURE EXTRACTION 

The preprocessed image is given to SIFT feature extractor. 
Lowe proposed SIFT algorithm [27], which consists of four 
major stages: (1) scale-space extrema detection, (2) keypoint 
localization, (3) orientation assignment, and (4) keypoint 
descriptor. 

Among various image features like corners, edge features, 
moment invariants, etc., SIFT features have been an obvious 
choice for the proposed approach because they are invariant to 
scale, orientation, and affine distortion, and partially invariant to 
illumination changes. Using SIFT algorithm, the keypoints 
extracted for Figure 1 (a) and (c) are ee in Figure 2. 





eee 
439 § SIFT rered deteoted in 
CLAHE applied image 

Figure 2: “SIFT features extracted for Figure !(a) and (c)” 


a "1290 SIFT features detected m (b) 
Original image 


The more number of SIFT features in Figure 2(b) illustrates the 
advantage of the performed preprocessing step. 


4.3 FEATURE MATCHING 

The SIFT features extracted in the left and right stereo images 

are matched using the procedure prescribed in [27, 28]. The 

matches that are too ambiguous are rejected. Matching follows a 
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nearest neighbor approach. The feature F, in the left image 
matches feature F, in the right image, if the distance 

d(F,F,)<td(F,,F,), for all k€l;; jk (3) 
where t is a threshold. We have selected the value of the 
threshold t to be 0.5. The initially obtained matches are used to 
estimate homography and fundamental matrices between pair 
of images using RANSAC [29]. The set of matches which fits a 
certain model (homography or fundamental matrix) are 
considered as inliers for that model. In order to further increase 
the robustness, outlier rejection rule, called X84 [30], is 
applied. The robustified inliers are used to re-estimate the 
parameters of the models. The best-fit model (homography or 
fundamental matrix) is selected according to the Geometric 
Robust Information Criterion (GRIC) [31]. The final matches 
are the inliers from the best-fit model. 


4.4 GROUND PLANE HOMOGRAPHY HYPOTHESIS 
Since, the algorithm assumes that the textured dominant ground 
plane lies in the lower part of the scene; a sample of putative 
matching points lying in the lower part of the image is selected. 
In our experiment, we have created the sample with the putative 
matches that lie in the lower 10% of the image. A heuristically 
designed predefined ground plane mask, as in [20], or a 
trapezoidal region in the lower central part of the image, as in 
[32], can also be used to select a sample of putative matches that 
lies on the ground plane. 
The selection of four initial points from this sample to estimate 
homography is of vital importance, These four initial points will 
influence the likelihood that they determine a valid 
homography. The selection of the four initial points from the 
sample is based on the following criteria [33]: 
1. Selected points should not be too distinct. 
2. Selected points should not be too close. 
3. No three selected points should be collinear or near 
collinear, 
4. Selected corresponding points should have a large disparity 
in position. 
Practically more than four points are required for the accurate 
estimation of homography matrix H. Algorithm SelectBestN 
listed in Table 1, is used to select best N-points based on the 
fitness score, computed for each point according to the 
mentioned fitness criteria. 


| Algorithm SeloctBestNGd, M'L n, Te, Td Ny 


2 te] 

3 Form pairs of points (p,p,),such that p, p&Mi,1 <i<n AND i<j. 
Select a pair (p,p,) and for every pair repeat step 4 to 8 

4 Find midpoint m of the point p, and p, fit a line / passing through 
shies Cools und fit etg cerpondicilar te line’! end pasting 
through the midpoint m 

5 A ES na ag ge Seamer 

6 If distance d(ppem)> Te and d(p,m)<Td, set i(k) =] else set i(k) = 
7 ee Ce ea 

0.3 xd(pp.gq)+0.2 *8(py ti(k) x0. m 2 xdr) 
8 rert] 


9 Select r, such that median(fs(r,:)) is maximum. 

10 Sort fs(r,:). Return p, and p, for the selected value of r and topmost 
N-2 points from /s(r) 

“ALGORITHM to select best N-points based on fitness scores” 





daH [1.04012] 0.9950 | 0.8681 | 1.2627 
| Avg(dist(x-Hx’)) | 0.8857 | 0.8489 1.1525 


Median(dist(x- 0.9280 1.1329 
Hx’ 





No. of pts. used 
for H estimation re 
sample 


A er 1.4764 1.0605 1.0569 


Mat ATE 1.1692 3706 
Hx’ 
> est Runa 


for H estimation the 
sample 
se) Ee ee 
| 75435 | | 1.4802 | Rs ; 1.1583 | 1583 


Ards E) a Ea 
Hx’ 


Figure 3 shows the result of execution of SelectBestN 
algorithm. In the figure, the ‘+’ markers are SIFT matches, ‘o” 
markers is the sample lying in the lower 10% of the image. The 
‘@’ markers are selected by the SelectBestN algorithm for N=6. 





Figure 3: “Result of SelectBestN algorithm” 


4.4.1 GOODNESS OF HOMOGRAPHY ESTIMATE 

The determinant of the homography matrix signifies the 
goodness of a homography estimate [34]. If the determinant 
tends towards to zero, it suggests the arrival of degeneracy in 
the selected points. Table 2 lists the values of determinant of 
homography matrix, average distance error and median of 
distance errors for three different test executions 

Test Run No. 1 





No. of pts. used 
for H estimation 










Test Ran No. 2 






1.3620 





Test Run No. 3 







Table 2: “Goodness of e estimate for different test 
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The results indicates that ave(dist(x-Hx’)) is very high, if det(H) 
tends to zero. The homography estimate with minimum average 
distance error is selected for classifying SIFT features as either 
“belong to ground” or “not”, 


4.5 CLASSIFICATION OF SIFT FEATURES 


The corresponding points that lie on a plane shares tne same 
homography, which is different from the homography for 
another plane. In reality the homography equation 3 is satisfied 
only approximately. A pair of corresponding points (x,x’) is 
considered to agree with the ground plane homography 
hypothesis, H, if for some threshold e, 

dist(x, Hx’) < e (4) 
Equation 4 is used to classify the matched SIFT features as 
belonging to ground plane or not. Figure 4 shows the matched 
SIFT features that have been classified as belonging to ground 
plane, with the value of s being kept 5. 
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Figure 4: “SIFT features belonging to ground plane” 


4.6 AUGMENTING GROUND PLANE 


It is likely that that the feature points classified as lying on the 
ground plane are In the smal! region of the whole plane. In such 
circumstances, the region enclosing the classified feature points 
would not be accepted. We have addressed this problem by 
using image segmentation algorithm based on mean shift and 
normalized cuts [35]. The segmented region which encloses 
maximum number of feature points classified as lying on 
ground plane is finally labeled as ground plane. This step has an 
additional advantage of filtering outliers. 


S EXPERIMENTAL TESTS AND RESULTS 

A set of outdoor images were collected from the campus of 
PEC University of Technology. Figure 5 - 6(a and b) shows 
some of the samples of stereo image pairs. The images were 
taken at different times and illumination conditions. Images of 
different ground planes i.e. tiled, cemented, grassed, etc. and 
artificial obstacles of different sizes were taken. Enough 
variation is kept to make the classification task challenging. 





$< A thet 


a oe 


eine ate nas A 


ai sreg DAN ka 
c yaeta otras Rr Hid: 


aye Eeid lke i} 


E reas. NT jir p 


Aantje 
i E AÀ 
i 





(d) Ground Plane Estimation - Ground Plane features (‘*’), Non-ground plane 
features (‘+’) 
Figure 5: “(a-B) stereo image pair-I, (c) sift matches in stereo 
images, (d) features classified as belonging to ground plane” 
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(a) Left Image 
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(d) Ground Plane Estimation - Ground Plane features (“*"), Non-ground plane 
features (*+’ cross) 
Figure 6: “(a-B) stereo image pair-II, (c) sift matches in stereo 
images, (d) features classfified as belonging to ground plane” 


On the execution of the proposed approach, the results are 
collected in form of points marked in the image as points lying 
on the ground plane and another cluster of non-ground planes. 
The points which actually lie on ground plane and are marked 
as ground plane points are termed as true positives. Table 3 
contains the result of execution of the proposed approach on 
the sample image pairs. 


aN ote ean 


points foung 
Non-Groun 
Plane eves 


found 





A 








Table 3: Result of execution on Image Pair IH 


Figure 7 shows the ROC (Relative Operating Characteristic) 
curve. Points above the diagonal in the ROC curve clearly 
indicate that the proposed approach has good classification 
rate. 
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Figure 7: “ROC Curve” 


Experiments are also performed to determine the distance 
range, in which, the proposed approach has a good 
classification rate. At time of grabbing the images, the distance 
of the obstacles from the camera is noted. The analysis of the 
proposed approach with respect to distance is shown in form of 
a graph in Figure 8. ~ 


Distance Range Analysis 


12 

















E 
Figure 8: “True Positive rate Vs Distance graph” 


The experimental results indicate that the appropriate operating 
distance range for the proposed approach is 3 feet to 12 feet. If 
the obstacle is less than 2 feet away from the camera, the 
assumption that the lower 10% of the image is dominant which 
ground plane is violated. If the obstacle is placed very far away 
from the camera, the features of the obstacle planes are lost and 
it appears to be the same as the ground plane. Hence, the 
classification rate of the proposed approach is poor. 


CONCLUSIONS 

The paper presents homography based approach for ground 
plane detection. The homography is very susceptible to the 
position of points used for its estimation. The paper presents a 
point selection algorithm which selects points based on fitness 
criteria for the accurate estimation of homography hypothesis. A 
homography estimate with minimum average distance error is 
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used for classifying SIFT features as either belonging to ground 
plane or not. Generally, the feature points classified as lying on 
the ground plane are in the small region of the whole plane. This 
problem has been addressed by using image based segmentation 
using mean shift and normalized cuts. Experimental tests have 
been conducted to evaluate the performance of the proposed 
approach. The performed tests indicated that the proposed 
approach has good classification rate and operating distance 
range from 3 feet to 12 feet. 
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ABSTRACT 

Directorate General of Foreign Trade, a department of 
Ministry of Commerce and Industry, Govt. of India, is 
responsible to formulate, regulate and implement the Foreign 
Trade Policy through its 36 Port Offices through India.: This 
ls the case study of best e-governance project. This project is 
highlighted in various e-governance seminars /workshops. This 
is the first govt. project in which ICT was implemented in 1998. 
It is soon equipped with Digital Signature and Electronic Fund 
Transfer facility. The present study is an example of innovative 
use of Information and communication technology (ICT) for 
on-line delivery. The present services in the Directorate 
General of Foreign Trade (DGFT): e-licensing, e-BRC, e- 
tendering, e-monitoring, e-meeting e-delivery, e-PRC, e- 
grievance re-addressal etc. The web has been played a 
dynamic role for reengineering and transformation of trade 
processes for an efficient, cost effective and seamless trade 
facilitation. 


1. INTRODUCTION 

In [1] the survey report highlighted the importance of e- 
government to improve the public service delivery system to 
facilitate people in day to day life. E-governance is a tool for 
developmental activities of any country while in [2] clarify that 
ICT is a powerful media to transmit the information and 
knowledge to end user. Most effective and fast solution can be 
achieved to integrate the technology and planning for economic 
growth and sustainable human development. [3,4] reflect the 
idea that ICT may help to government in such a way that new 
innovative arrangements can flourish instead of traditional 
institutional arrangements. Such successful initiatives will 
deliver benefits to citizens and improve the efficiency of 
government and governmental agencies. [5] represents the role 
of e-government, [7,9,10,11,13,14,16] represent the web site of 
Maharashtra, West Bengal, Madhya Pradesh, Haryana, 
Himachal Pradesh, Rajasthan, and Andhra Pradesh showing 
the various E-Governance initiatives and applications being 
implemented in respective states. Mostly websites focus on IT- 
enabled services and e-governance which include call centre, 
data processing, back office. [8] Reflects the comparative 
impact study of many e-governance central government 
projects out come. processing. [12&15] express the web site 
features of Ministry of Information Technology including the 
centralized ¢-governance projects. The object is included to 
offer technical services like consultation, awareness among 
decision makers at the centre as well as state level and guiding 
them implementing process and policy for effective 
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governance. The IT based services take place for better citizen 
interface by computerization of Land record, vehicle 
registration, electricity/water billing, licensing, distance 
education, health services, online examination, birth/death 
certificate , epost, e-court etc through internet. Transparency 
and free sharing of government database are the major factor to 
gain the public trust. Henceforth innovative use of technologies 
is powerful tool to fulfill the requirements of general citizen. In 
continuation [6] expressed the vital role of online services in 
around the world. The survey report shows that in many 
countries e-government initiates and information and 
communication technologies applications take place for the 
people to have better public services. Being the catalytic role of 
innovative technology solutions in government working has 
gained special recognition. Now a days in the world climate it 
is very important for the governments to increase electronic 
service delivery system towards in term of e-government and e- 
governance. [17] is also a example of ICT evolution in Banking 
while in [18] ICT strengthen the management and planning of 
water resources in rural areas. Keeping in view the gist as 
mentioned in cited reports we have implemented the innovative 
use of technology in Trade Facilitation in Govt. of India ina 
systematic manner as: 

Directorate General of Foreign Trade (DGFT), an ISO 
9001:2008 certified organization regulates and facilitates 
India’s foreign trade by implementing the Foreign Trade Policy 
and its various Schemes, announced from time to time through 
Public Notices, Notifications and, Circulars etc. One of the key 
elements of the policy, apart from providing fiscal and financial 
incentives to exporters, is to address the issue of high 
transaction cost In India, so as to improve our global 
competitiveness. The facilitator role includes resolving trade 
dispute and attending to exporters / importers grievances. 

At the global level, ease of doing business is one of the 
important parameters on which the status of trade facilitation in 
a country can be benchmarked. The World Bank’s Doing 
Business Report 2009 and 2010 have pointed out that India is 
quite behind comparable economies like China, Indonesia and 
Mexico in this regard. The high transaction time and cost 
associated with the foreign trade processes have an adverse 
impact on competitiveness of Indian exports. 

With this end in view, the Directorate General of Foreign Trade 
had established a web based trade facilitation system under 
which EDI interfaces with the Trading Community and all 
concerned stakeholders in the value chain have been 
established. This includes Customs, Banks, Trade and Industry 
and other Government Agencies to facilitate seamless flow of 
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e-documents and information. ‘Web’ is in-fact the driving 
engine in this entire endeavor. 


2. DIRECTORATE GENERAL OF FOREIGN TRADE 
(DGFT) INFRASTRUCTURE 

Directorate General of Foreign Trade (DGFT) is a multi 
locational organization, with 36 offices throughout the country. 
It has Headquarters in Delhi and four Zonal offices at Mumbai, 
Kolkata, Chennai and Delhi. All locations are interconnected 
with high speed bandwidth leased line. A backup network 
facility has also been provided using Broadband. 


3. ROLE AS WELL AS FUNCTIONALITIES OF DGFT 
The role and functioning of DGFT requires intensive and 
innovative use of Information and Communication Technology 
(ICT). Thé ‘Web based solution’ has become a core 
implementation strategy for delivery of an efficient, transparent 
and easy to access service. The web service includes the 


following for providing information and implementing 
transactions; 
(i) ‘On-line’ filing of applications for obtaining all 


Authorizations through web (B2G model). 
(ii) ‘On-line’ filing of applications for obtaining Importer 


Exporter Code (TEC). 

(iii) Interfaces with various Electronic Data Interchange 
(EDI) Network Partners. 

(iv) Hyperlinked Foreign Trade Policy/Procedure with latest 


amendments / updates. of status of various applications / 
authorizations 

(v) A Comprehensive Chapter-Wise Directory of Products 
based on Indian Trade Classification (ITC) for 
importability / exportability. 


(vi) Hyperlinked Foreign Trade Policy/Procedure with latest 
amendments / updates. 

(vii) A Comprehensive Chapter-Wise Directory of Products 
based on Indian Trade Classification (ITC) for 
importability / exportability. 

(viii) Web based monitoring of status of various applications / 


authorizations. 


4. KEY FUNCTIONALITY THRUST AREAS OF 
DGFT’S WEBSITE 

A snap shot of the DGFT’s website is shown in Figure 1. 

The functionality thrust of DGFT’s website is on the following 


parameters. 


4.1 CITIZEN FOCUS 

The citizen focus of the web delivery services is achieved 

through: 

e Accountability and ‘SMART’ e-Governance Services 
(Specific, Measurable, Attainable, Realistic, Timely) 

e Transparency in Operations and access to information 

e Continuous improvement in performance and integrity of 
public services 


e Continuous simplification of Export Promotion and Trade 
Facilitation measures 

4,2 REACH 

e ‘On-line’ facility for FTP operations available globally, 
round the clock 

e (24x7x365) through the DGFT’s 
(http://dgft.gov.in) to almost 5 lakh users 

° 36 regional offices of DGFT’s spread (byt virtually being 
one) across the country 

e Facilitating of a broad range of ‘e-filing’ applications 
under different Schemes like Advance Authorization (AA), 
Duty Entitlement Passbook (DEPB), Export Promotion 
Capital Goods (EPCG), incentive/reward schemes i.e. 
Focus Market / Products, Vishesh Krishi Upaj Yojna, 
etc. 

e EDI linkages with trade and Industry, Government. 
Agencies and related EDI community partners i.e., 
Customs, banks and EPC’s etc. 


web portal 


4.3 SCOPE 

(i) Information Access; 

= Foreign Trade Policy / Procedure, Publication of 
Notifications / Public Notices / Circulars / Trade Notices 

« Indian Trade Classification for Harmonised System 
(ITCHS) for providing status on importability / 
exportability 

= Standard Input / Output Norms (SION) for providing 
details on imports required for export products 


(if) Transaction Facility; 

e Covers all models of e-governance i.e. B2G, G2G, G2B, 
G2C and C2G. 

B2G: ‘On-line’ filling of application of authorization/ Importer 

Exporter Code (TEC) by any business organization 

G2G: Message Exchange with Customs, Banks 

G2B: Model for ‘on-line’ filing of applications for 

authorization / Importer Exporter Code (IEC) and Status 

thereof 

G2C: ’On-line’ tracking of application status 

C2G: ‘On-line’ filling of application by individual. 


(iti) Monitoring and Tracking 

è Redressal of Trade related queries. 

e MIS available on real time basis. 

e Elimination of fraudulent practices by unscrupulous 
elements. 


5. INNOVATIVE USE OF IT FOR WEB ENABLED 
APPLICATIONS 

‘Information Technology’ has been innovatively used for web 

based solutions in DGFT not only to merely Automate and 

Informate but to Transformate the entire value chain of trade 


processes. 
The Web Enabled Reengineered Work Flow among the various 
stake holders is shown in Figure 2 
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The innovative technology intervention has also led to 
strengthing and almost complete compliance of DGFT’s 
website with the stipulated web guidelines of the Ministry of 
Information Technology 

The innovative use of Information Technology (IT) has been in 
the following areas: 


5.1 “On-line” Filing of Applications: 

e Flexibility has been provided in ‘e-filing’ of application 
through both modes i.e. ‘on-line’ and ‘off-line’. The ‘e- 
filing’ facility covers all authorizations on an ‘on-line‘ 
mode, To maintain high level of server response, an ‘off- 
line’ data entry module for Advance Authorizations (AA) 
and Export Promotion Capital Goods (EPCG) has also 
been made available on the website. This hybrid approach 
has enhanced flexibility and eased operations significantly 


5.2 The EDI Linkages (‘On-line’ meen: Exchange with 
Various Trade Partners); 

“Message Exchange” with Customs, Banks and EPC’s is 

Digitally Signed. The Message Exchange design includes a 

structured and comprehensive monitoring and tracking system 

comprising of acknowledgment and error message flagging. 

e An appropriate communication technology has been used 
for different network partners based on users profile, 
technical and process requirements 

e An ‘ePayment’ facility for Authorization fee payment 
with verious banks having Net Banking Facility is 
available. The number of participating banks is further 
being expanded to enlarge coverage and scope. 


5.3 TECHNOLOGY UP GRADATION: 

Technology up gradation of Hardware / Software and 

networking is a continuous exercise. The last up gradation was 

done in 2011.The present technology profile support / web 
service is as under: 

e J2EE technology (Applet, Servlet, Enterprise Java Beans, 
JSP, ASP),XML IBM DB2 as database with digital 
signature. 

J Builder 2007, J2SDK/J2SEE tools for applications 
development. 

IBM Web Sphere, Macro Media JRun Web Server for 
application servers. 

e Rational Suite is implemented for documenting/ 
designing/ development of the application 

e The website is being updated using the Extended Markup 
Language (XML) technology. 

e Whole DGFT Organization is connected with internet / 


intranet / VPN through very high speed connectivity with’ 


NICNET infrastructure. 


5.4 ADOPTION OF COMPREHENSIVE TECHNOLOGY 
MANAGEMENT PRACTICES 

‘On-line’ data backup / archiving is done regularly so as to 

ensure that only 2 years data is available for ready access and 

the rest is archived. This improves the server response. 


e A Data Warehousing of complete license database from 
which we may retrieve the data as per the requirement, as 
and when arises using the Data Mining technology. 

e A Disaster Recovery Site has been installed and is 
maintained at National Informatics Center (Regional 
Office, Hyderabad). 

ə. Site is maintained regularly to ensure 100% uptime. 


6. THE IMPACT ASSESSMENT 

6.1 REDUCTION IN TRANSACTION TIME 

e Cost of preparation of application for an exporter almost 
brought down to 0. 

e Time required to prepare an application has come down to 
5 minutes from 5 hours on an average. 

e Processing time of application has come down to 1 hour 
instead of 45 days 

e Message Exchange for Authorizations have brought down 
license verification time from 6 months to automatic 
instantaneous verification. l 

e - Status tracking of applications only a click away. 

e Need of paper eliminated completely for application. 


6.2 REDUCTION IN TRANSACTION COST 

e Application can be filed from anywhere 

e Visits of exporters / their representative’s to DGFT offices 
have been reduced to minimum. 

e Trade related documents have been streamlined and 
reengineered to enhance transparency with no redundancy. 

e Dispensation of physical documents due to integration of 
digital signature with the system. 

e Application fee has been halved for ‘on-line’ application. 

e Paper cost brought down by 80%. 

e Physical interface with exporter being reduced further 
through video conferencing. 

e Journey to a paper less and a green DGFT fast tracked. 


“7, THE IMPACT SCORE CARD OF TRADE 


FACILITATION NOW DAYS: 





@; Aimon Instantaneous / Automatic 
Tablel: “Impact Score Card” 
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8 EXTERNAL RECOGNITION: 
e Runner Up for E-Asia Award (AFACT 2005), Taiwan. 


[8]. 


e Participated in ICT solutions for good Governance, 2004 in * 


Hyderabad 


CONCLUSION AND FUTURE SCOPE 

In this paper, we have presented the effects of Innovative 
Technology in Directorate General of Foreign Trade, Ministry 
of Commerce and Industries, Govt. of India that support the 
decision whether an Importer/Exporter get the maximum 
benefit in minimum time schedule without physical 
intervention in Governments Offices within transparent 
environment. The Government of India, department of 
Electronics and Information Technology, has initiated national 


¢-governance plan for the execution of e-governance projects in 


the country. In the same manner we have applied the latest 
techniques in DGFT to move in a successful | e-governance. 
The fruitful results and outcome has been mentioned to prove 
the major impact of Innovative technologies in government 
sector. 

*Note: 

Author is posted in Directorate General of Foreign Trade and 
he is a senior member of Technical Team to automate the 


DGFT Organization. 
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ABSTRACT 

Spin transport in nano structured devices depends on interface 
resistance, ‘electrode resistance, Spin polarization and Spin 
diffusion length. Spin Hall Effect (SHE), caused by Spin—orbit 
scattering in nonmagnetic conductors, gives rise to the 
conversion between Spin and charge currents in a non local 
device. Recently, SHE has been observed using non local Spin 
injection in metal-based nanostructured devices, which paves 
the way for future Spin electronic applications. In present 
work we have theoretically analyzed the SHE phenomena 
based on experimental results obtained till date. We have used 
the Hamiltonian of two dimensional electron systems with 
Rashba Spin-orbit coupling. We undertake the quantitative 
analysis of Spin Hall Effect in low dimensional materials 
using Spin dynamical equations and Spin Hall conductivity. 


KEYWORDS 
Spin transport, Spin Hall Effect, nanostructures. 
PACS: 75.76. +j, 73.43.-f, 73.63.-b. 


1. INTRODUCTION 

Spin-dependent transport phenomena in nanostructures are of 
great interest in the potential applications to Spin electronic 
devices [1]. Recently much attention has been paid to the Spin 
Hall] Effect, which allows the polarization of electron Spins in 
nanomaterials [2-6]. In the Spin Hall Effect, electrically 
induced Spin polarization accumulates near the edges of a 
channel and is zero in its central region. This effect is caused 
by deflection of carriers, moving along an applied electric field, 
by extrinsic [3] and/or intrinsic [4] mechanisms. In a non- 
magnetic homogeneous system, Spin accumulation is not 
accompanjed by a charge voltage, because two Spin Hall 
currents due to Spin-up and Spin-down electrons cancel each 
other [2]. The absence of transverse voltage leads to difficulties 
in probing the Spin Hall Effect: measuring a charge 
accumulation is much easier than measuring a Spin 
accumulation. 

Recently, the Spin Hall Effect has been observed both optically 
[5] and electrically [6]. Valenzuel and Tinkham [6] have 
reported the electrical measurements of the Spin Hall effect in a 
diffusive metallic conductor, using a ferromagnetic electrode in 
combination with a tunnel barrier to inject a Spin-polarized 
current. An induced voltage has observed that results 
exclusively from the conversion of the injected Spin current 
into charge imbalance through the Spin Hall Effect. Such a 


voltage is proportional to the component of the injected Spins 
that is perpendicular to the plane defined by the Spin current 
direction and the voltage probes. In a Spin—orbit-coupled 
system, a non-zero Spin current is predicted in a direction 
perpendicular to the applied electric field, giving rise to a Spin 
Hall Effect [7, 8]. Consistent with this effect, electrically 
induced Spin polarization was recently detected by optical 
techniques at the edges of a semiconductor channel [9] and in 
two-dimensional electron gases in semiconductor hetero 
structures [10, 11]. 

Efficient Spin injection, Spin accumulation, Spins transfer and 
Spin detection are key factors in utilizing the Spin degree of 
freedom as a new functionality in Spin electronic devices. By 
analyzing the Spin transport in the structure, we obtain the 
optimal conditions for Spin accumulation and Spin current. The 
injection of Spin-polarized electrons and the detection of Spin 
accumulation depend strongly on the nature of the junction 
interface. 

The theoretical studies on quantum Spin Hall Effect in solid 
systems are mainly included in metallic graphene and 
semiconductor system with strain gradient sand. However, 
since the Spin-orbit interaction in graphene is too small, the 
theoretical proposals in such systems are difficult to be 
achieved in experiment. Quantum Spin Hall regime is also very 
difficult to achieve in semiconductor systems with strain 
gradients, due to the demanding requirement of a large strain 
gradient with special configuration and a very low electron 
density with a clean environment. Spintronics in 
semiconductors is richer scientifically than Spintronics in 
metals because doping, gating, and hetero junction formation 
can be used to engineer key material properties and because of 
the intimate relationship in semiconductors between optical and 
transport properties. Spin transport in nano structured devices 
depends on interface resistance, electrode resistance, Spin 
polarization and Spin diffusion length. Spin Hall Effect (SHE), 
caused by Spin—orbit scattering in nonmagnetic conductors, 
gives rise to the conversion between Spin and charge currents 
in anon local device. 

Optical: coherent control method provides a remarkable 
controllability in the dynamics of atomic Spin states. 
Furthermore, parameters of cold atomic systems, ¢.g. atomic 
number, atom-atom interacting strength, can be well controlled 
in current experiments. This makes it possible to control the 
atomic Spin propagation through optical methods, and further 
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demonstrate the quantum Spin hal! effect (SHE) in neutral 
atomic system. 

Also the quantitative analysis of the Spin Hall Effect can be 
done by measuring electrically in completely non-magnetic 
systems and without injection of Spin-polarized electrons. A 
comparative study of requirement tools showing trends in the 
use of methodology for gathering, analyzing, specifying and 
validating the software requirements has been used in 
[17],which has helped us in theoretically analyzing the Spin 
Hall Effect in low dimensional materials using Spin dynamical 
equations and Spin Hall conductivity. 


2. RESULTS AND DISCUSSIONS 
In high mobility two-dimensional electron systems (2DES) that 
have substantial Rashba Spin-orbit coupling [12], Spin currents 
always accompany charge currents. The Hamiltonian of a 
2DES with Rashba Spin-orbit coupling is given by 

ya? A (8x p) (1) 

P er 

Where 4 is the Rashba coupling constant, O refresh the Pauli 
matrices, m is the electron effective mass, and Z is the unit 
vector perpendicular to the 2DES plane. The Rashba coupling 
strength in a 2DES can be modified by as much as half, by a 
gate field [13]. The above discussion is valid even when the 
atomic number, Z, is not equal to one (hydrogen). Recent 
observations of a Spin-galvanic effect and a Spin-orbit 
coupling induced metal-insulator transition in these systems 
(14], illustrate the potential importance of this tunable 
interaction in semiconductor Spintronics [115]. 
The dynamics of an electron Spin in the presence of time- 
dependent, Zeeman coupling is described by the Bloch 
equation: 

UG (2) 

- at dt 
Where A is direction of the Spin and @ is a damping 
parameter, that we assume is small. For the application we have 
in mind, the p dependent Zeeman coupling term in the Spin 
Hamiltonian is— S.A / Å, where A= 2A/h(zx p). The Spin 
orbit interaction is a purely relativistic effect, which is derived 
from the Dirac equation. 

The Spin Hall (SH) conductivity Ogy can be given by the 

following equation [4]: 
= Jw = face (3) 
E. 8r 
Which is independent of both, the Rashba coupling strength 
and of the 2DES density. 


But in homogeneous charge and current densities conditions:- 


Jy = rwyt eDV E Y br 


Where Spin conductivity is 


Ory = enyyH l i 

where o;( o p is Spin conductivity due to Spin up (f) and Spin 
down (|) electrons respectively and n(n 1) is electrons density 
of states for Spin up (t) and Spin down (|) alignment. 

Current J; ;q,coupled to the electric field Ey, which is 
responsible for the Spin hall Effect is 

Iar = engi Eo. 

Thus, 

lory = Ay En 

Show that Spin conductivity is directly depending upon 
Electric field and the current I. 

Tin lory Ps 

Which when integrated to get the Hall Voltage ,we gives 

£ m Ayn PJs 

Where œ is the width of the Spin hall device N 

Thus Transverse, Hall Voltage as a function of the longitudinal 
Electric field Ep is plotted as:- 


tell Voltage pY 
© 
wb 


oa 
99 pe 


Q x an Bien « 100 
Figure 1: “Traverse Voltage as a function of the longitudinal 
electric field” 


The above result shows an understanding that the average 
profile curve between the Hall voltage and the longitudinal 
electric field is normally linear in nature. Certain other Models 
for the above have also been proposed by the [18], in which 
novel approaches for developing more efficient relationship 
model between time and efficiency, have motivated us to 
model the above work. 

In diffusive normal metals, the SHE is known to be induced by 
the Spin-orbit scattering originating as an extrinsic effect due to 
impurities or defects [16]. Since the optical detection technique 
is limited for semiconductor systems, the electrical detection is 
the only way to access the SHE in diffusive metals. Nonlocal 
Spin injection in nanostructured devices, provides a new 
opportunity for observing Spin Hall Effect. If Spin-polarized 
electrons flow in nonmagnetic electrode, these electrons are 
deflected by Spin—orbit scattering, to induce Spin and charge 
Hall currents in the transverse direction and accumulate Spin 
and charge at the edges. 


FUTURE SCOPE & CONCLUSION 
In nano devices the distribution of the current across the 
interface depends on the relative magnitude of the interface 
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resistance to the electrode resistance. When the Interface 
resistance is, much larger than the electrode resistance as in 
tunnel junctions, the current distribution is uniform in the 
contact area, which validates the assumption of uniform 
interface current. However, when the interface resistance is 
comparable to or smaller than the electrode resistance as in 
metallic contact junctions, the interface current has 
inhomogeneous distribution with a high current density around 
a corner of the contact. Using the nonlocal Spin injection, a 
pure Spin current is created in nonmagnetic conductors, so that 
we have the opportunity to observe the Spin-current induced 
SHE in nonmagnetic conductors via the Spin-orbit scattering 
by nonmagnetic impurities. The observation of the SHE 
provides direct verification of the existence of Spin current 
flowing in nonmagnetic conductors. In a reversible way, the 
electrical current creates the Spin current via the SHE, which 
provides a Spin-generating source without the need to use 
ferromagnetic materials. The nonlocal Spin injection also 
makes it possible to realize a nonlocal Spin manipulation. The 
advantages of nonlocal lateral structures are flexibility of the 
layout and the relative ease of fabricating multi terminal 
devices with different functionalities. The development of 
nonlocal Spin devices is a new challenge In the research field 
of Spin electronics. 

This result opens up a new possibility to use normal metals 
with high Spin-orbit coupling as Spin current sources operating 
at room temperature for the future Spintronic applications. 
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ABSTRACT 

The issue of performance evaluation and prediction has 
concerned the users throughout the history of computer 
evolution. In recent times the parallel computer is gaining 
popularity as an effective solution to low cost supercomputing. 
In this study we discuss a simulation study, performed for 
evaluating the performance of parallel computers connected in 


diferent topologies. 


KEYWORDS 
Performance measures, Processor 


utilization, Throughput. 


1, INTRODUCTION 

The future need of much more powerful super computation 
asks for parallel (digital) computers, containing a large number 
of fast processors that can cooperate quickly and efficiently. In 
parallel processing, high performance data processing and data 
flow are of equal importance. In practice so far, loss of 
efficiency often happens for the technical reason that the 
communication system of a parallel computer has not enough 
capacity. Lack of commumication capacity will result in 
transfer bound processing instead of computation bound 
processing. Loss of efficiency also often happens because a 
parallel algorithm is stil) in the early stage of development. 
That makes it difficult to define the architecture and 
programming of a parallel computers such that, efficient 
implementation of parallel algorithms is possible in a wide 
range of applications. Moreover, the appligability of parallel 
computation is hampered, since the programming in parallel 
computation is still more difficult than programming in serial 
computers, 

The need for computer performance evaluation exists from the 
initial conception of a system’s architectural design to its daily 
operation after installation. In the early planning phase of a 
new computer system product, the manufacturer usually makes 
two types of predictions. The first type is to forecast the nature 
of applications and the levels of system workldads of these 
applications. Here, the term workload means the amount of 
service requirements placed on, the system. The second type of 
prediction is concerned with the choice between architectural 
design alternatives, based on hardware and = software 
technologies that will be available in the design period of the 


utilization, System 


planned system. Here the criterion of selection is known as cost . 


performance trade off. The accuracy of such prediction rests, to 
a considerable extent on the capability of mapping the 
performance characteristics. Such translation procedures are by 
no means straightforward or well-established. After the 
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architectural decisions have been made and the system design 
and implementation started, the scope of performance 
evaluation becomes more specific. The interactions among the 
operating system components—algorithms for job scheduling, 
processor scheduling, and storage management must be dealt 
with, and their effects on the performance must be predicted. 
Comparing the predicted performance with achieved 
performance often reveals major defects in the design or errors 
in the system programming. Now, it is universally accepted 
that the performance evaluation and prediction process should 
be an integral part of the development efforts, throughout the 
design and implementation activities. 


2. MEASURES OF PERFORMANCE 

When it 1s said that the performance of the computer is great, 
it means, perhaps, that the quality of service delivered by the 
system exceeds the expectation. But the measure of service 
quality and the extent of expectations vary depending on the 
individuals involved, eg, system designers, installation 
managers, terminal users, etc. If an attempt is made to measure 
the quality of computer performance in the broadest context, 
then issues like user response (as well as the system response), 
ease of use, reliability, user’s productivity, etc must be 
considered as the integral parts of the system’s performance. 
Since the performance analysis cannot avoid issues that are 
ultimately behavioural, the scope of this is discussed only in 
terms of clearly measurable quantities. This is done in the 
conventional way as, for instance, the signal-to-noise ratio 
probability of decoding errors as measures of performance of 
communication systems. 

The performance measures can be classified into two broad 
categories: 

(i) user oriented measures, and 

(ii) system oriented measures. 

The user oriented measures include such quantities as the 
turnaround time in a batch system environment and the 
response time in a real time and/or interactive environment. 
The turnaround time is the length of time that elapses from the 
submission of the job, until the availability of its processed ` 
result. In the similar way, in an interactive environment, the 
response time of a request, represents the interval that elapses 
from the arrival of the request until its completion in the 


system. 

Usually jobs are categorized according to their priority classes. 
Many, factors may determine the assignment of priority to a 
job: ‘he job’s urgency, its importance and its resource demand 
characteristics and utilization. sae: 
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Throughput is defined as the average number of jobs processed 
per unit time, It provides the degree of productivity that the 
system can provide. But in this case, throughput is not an 
adequate measure of performance; rather it is a measure of 
system workload. 


2.1 SYSTEM UTILIZATION 

In an execution cycle, all the processors may not participate in 
execution and may be idle throughout an execution cycle, 
waiting for results from other processors. The utilization of the 
system in terms of the number of processors used in an 
execution cycle is quantified by the parameter S,, which is 
referred to as system utilization. 

An algorithm has been considered which is executed in r cycle 
on P processors, Suppose, in an execution cycle of t; time 
units, P; processors are used, and in the next execution cycle of 
tə time units, pz processor are used , and so on then, 

Sum (Py *tytPatyte ee +Prot, V(P*(tyttat......tt)). 


2.2 PROCESSOR UTILIZATION 

When the sub-domains assigned to different processors are not 
equal, then some processors finish computation earlier than 
others. As synchronization takes place at the end of every 
cycle, these processors wait for others to finish. This leads to 
idling and under- utilization of some processors which is 
quantified by the parameter P,u for processor i. It characterises 
the load balancing of the system. Perfect load balancing occurs 
when the sizes of the sub-domains assigned to all the 
processors are equal, i.e, when P,u=1, for i1,2,....,p (where P 
is the number of processors in the system). 


2.3 INTER-PROCESSOR COMMUNICATION TIME 

In a message passing through multiprocessor, if tsup 
represents the message start-up overhead or latency; tend 
represents transmission time (which is inverse of the link 
bandwidth); ‘k’ bytes between two neighbouring processor 
involve a communication time, toomm™*tstart-uptteend *K. 


When the communication is not between two near neighbours, 
the communication time is estimated by assuming that it takes 
place in hops, and each hop corresponds to a near neighbour 
communication, The communication time between two 
processors is nto, Where n is the number of hops by which 
the two processors are separated. 


3. ANALYSIS OF PARALLEL ALGORITHMS 

Once an algorithm for a new problem has been developed, it is 
usually evaluated using the following criteria: running time, 
number of processor used and cost’.Besides these standard 
metrics, a number of other technology related measures are 
sometimes used when it is known that the algorithm is destined 
to run on a computer based on that particular technology. 
Running Time 

As the speed is emerging to be the main reason behind the 
growing Interest in the field of parallel computers, the most 
Important measure of a parallel algorithm is, therefore, the 


running time. According to AK1', running time is defined as 
parallel computer, that is, the time elapsed from the moment 
the algorithm starts to the moment it terminates. If the various 
processors do not begin and end their computation 
simultaneously, then the minning time is equal to the time 
elapsed between the moment the first processor to begin 
computing starts and the moment the last processor to end 
computing terminates. 

In evaluating a parallel algorithm for a given problem, it is 
quite natural to do it in terms of the best available sequential 
algorithm for that problem. Thus a good indication of the 
quality of a parallel algorithm is the ‘speed-up’ it produces. 
This is defined as 

Speed-up= (worst-case running time of fastest known 
sequential algorithm for the problem)(worst-case running time 
for the parallel algorithm). 


3.1 NUMBER OF PROCESSORS 

The second most important criterion In evaluating a parallel 
algorithm is the number of processor it requires to solve a 
problem. It costs money to purchase, maintain and run 
computers. When several processors are present, the problem 
of maintenance, in particular, is compounded, and the price 
paid to guarantee a high degree of reliability rises sharply. 
Therefore, the large the number of processor an algorithm uses 
to solve a problem, the more expensive it becomes to obtain 
the solution. For a problem of size n, the number of processors 
required by an algorithm, a function of n, will be denoted by 
p(n). Sometimes the number of processor is a constant 


independent of n. 


4. IMPLEMENTATION 

In traditional implementation of parallel programs, ‘there is 
often no way of ensuring that the code implements designer’s 
Intentions. For example, a simple typographical mistake 
during coding can cause two processor to communicate when 
they should not, leading to disastrous, unpredictable 
consequences, If the design specifications could somehow be 
fed directly to the language processor, this unintended 
communication could be diagnosed syntactically, Inorder to be 
viable, the design must be formally defined as a computer 


language. 


5, DESIGN OF SIMULATOR 

In this simulator, a multiprocessor environment is simulated to 
evaluate the performance of different standard computation 
under various topologies. All the standard topologies like bus, 
ring, torus, hypercube, mesh, and tree are considered. 

The simulation is done in c language 


5.1 ASSUMPTIONS 

The model proposed here for performance prediction assumes 
that all inter processor communication times can be estimated 
a priori and that there are no unpredictable queuing delays in 
the system. An Input file, having two fields containing 
processor-ID name and process, and also the communication 
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file ig available. It is also assumed that any process can 
complete its message passing in one communication cycle if 
the.route-is free and the receiving process is ready. 
5.2 MODEL 
The input to this simulator is given after balancing load with a 
suitable load balancing technique. Here at each processor two 
queues are maintained: a ready queue, and a communication 
queue. In the beginning the ready queue at each processor 
contains all the processes assigned to that processor and the 
communication queue is kept empty. The round_-robin job 
scheduling technique is followed at each processor, le, each 
process at a processor, is given a time slice for execution. An 
execution cycle is followed by a communication cycle. In the 
processes requiring communication among themselves 
communicate, Before any of the two processes communicate, 
first the links connecting them through the shortest path are 
examined. Then the communication queue of the partner 
processor is searched for the partner process. If it is found 
there, the communication delay is added to the respective 
counters and the partner process is removed from the front of 
the ready queus and is placed at the rear of the ready queue. 
When all the queues are exhausted then the program 
terminates. Computation time is added to each process at the 
end: of each computatlon.cycle. It also calculates time of 
completion of each queue. That is done by adding execution 
time of all the processes at each processor separately. 
The different parameters and structures are described as 
follows. 
Structure processor includes 
(1) Current state of processor, ie, ‘o’ for every 1 for ready 
and 2 for idle. 
(ii) Time_ stamp, clock, link clock for each link; and 
(iii) Three process queues. Each queue has its own count. 
(a) Ready queue of active processes waiting for 
communication. 
(b) Communication_ queue of Inactive processes waiting 
for oommunication. 
(c) Wait queue of Inactive processes waiting to be 
creates as threads. 
Proc_ array is dynamically allocates array of processors. 
The declaration for the above is made as follows. 
Struct processor { 
Int Gurrent_ state; 
Unsigned double time_ stamp , clock,*link_clk; 
Int reedy_process_count,comm_process_count, 
Walt_process,count; 
Struct process*ready_q_tl,*walt_q_t,(* comm._q_tl; 
}**proc_arr; 


Structure process includes 

(i) Process_ id identification of the process; 

(ii) Priority of the process : 1 if urgent else 0; 

(ili) Current state of process 0 if over and 1 if ready; 
(iv) Partner_ proc : communication partner proceasor 


Partner_ process: communication partner Process; 


(v) Instruction queue and instruction count; and 


(vi) Pointer to the ‘next’ process in the linked list. 


The structure process is defined as follows. 

Struct process { 

Int priority, process id, inst_ count state; current state; 
Int partner_ proc, partner_ process; 

Unsigned double clock; 


Struct process*next; 


Struct inst_list,*instr_ hd, *instr_ tl; 

}; 

The structure instruction list includes 

(i) Type: integer value indicating the type of instruction; 

(ii) Params array : parameter required for that instruction; and 
(iif) Pointer to the next instruction in the linked list. 


The declaration for list_ list is given as follows: 
Struct Instr list { 
Int type; 
Int params4; 
Struct instr_ list*nexxt; -... 
}; 


Other variables declared include t_ cale which store the total 
computation time, ie, the time required to run the same 
application on a single processor. - 
Initialize ( ) 

The initializing subroutine is a semi- interactive subroutine 
which initializes all parameters used afterwards by the 
simulator. Here the number of nodes/ processors type of 
topology 

Processor used and Its frequency are taken as input. Also the 
clock used {s initialized. All the process counts are also 
initialised. 

get_link(int sp.int dp) 

This subroutine takes the destination and the source processor 
as input (as well as topology)and returns communication link 
between those two processors. Here popular topologies like 
mesh, star, hypercube, tree, torus and wk_recursive are 
consider as well as the logic topologies like ring, pipeline, etc. 
read_Input(cha* filename) 

This function reads the input file given in command line 
argument. Here declaration of various dummy statements is 
given which is the output file of the parser. Parser replaces the 
actual parallel C statements by these dummy Statements 
considering the worst case of execution. Here the queues for 
different processors are maintained to be used by the 
simulator. There are several smaller procedures doing different 
tasks. 

void create(int*pro_arr) 

This creates another (thread) process. This thread which was 
initially stored in walt queue and is moved to read queue. 
Delay Is added to the process and processor clock, and the 
process input in the ready queue end. 

void send(int*par_arr) 

This performs the communication operation ‘send’. At first the 
pertner processor of processes is updated. Then the 
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communication queue of the partner processor is reached to 
find out whether it contains partner process or not. If it is able 
to find partner process, then corresponding communication 
delay is added to both the processes. The partner processes is 
removed from the communication queue of the partner 
processor or is put in its ready queue, and the ready queue and 
communication queue of the partner process are updated. On 
the other hand, if it is unable to find partner process in the 
communication queue of the partner processor then the current 
process is put in the communication queue of the current 
processor. If it is able to find the partmer processor, 
corresponding link delays are added, and the current process is 
put in the ready queue end. 

void receive(int*pro_arr) 

This function performs the communication operation ‘receive’. 
Here, first the communication queue of the partner processor is 
reached for the partner process. If it is found, then 
communication delay is added to both the processes. The 
partner process is removed from the communication queue of 
partner processor or is put in its ready queue. The 
communication queue and ready queue of partner process are 
updated. If unable to find the partner process in communication 
queue of partner processor then the current process is put in the 
communication queue of the current processor. If it is able to 
find the partner process, the corresponding link delays are 
added and the current process is put at the ready queue end (the 
text is available with the author). 

There are several other procedures to add computational delay, 
communication delay, link delay, etc and procedure link send 
it changes the state of a process and removes it from the queue. 
Simulate ( ) 

This module does the simulation work and a file is opened to 
write the instructions as executed by sifnulator. It starts on 
processor zero. If the time_stamp of current processor is greater 
than allotted time slice or if it is ready process queue then the 
procedure processor schedule is allowed else procedure 
proceas schedule is called. 

The processor scheduler finds the processor with minimum 
clock as the new current processor. It follows a linear search 
for the above purpose. The process scheduler finds the process 
on the current processor having urgent priority and places it in 
ready queue head of the current processor so as to execute it 
next (details available with the authors). The simulator 
continues till all instructions of all the processes are over. 
Statistics ( ) 

This procedure calculates ali the statistical information and 
stores them in a file. Time is estimated for the application to 
run on a single processor, the overall efficiency. Maximum of 
all processor clocks (details available with the author) is also 
calculated. 


Algorithm 
Begin 
initialize( ) // Initializes various parameters and variables, 
read_input // Read input from the designated file. 
simulate // Start the simulation. 


td 


statistics // Transfer the desired results to a 
predetermined file, 
End 
6. DISCUSSION 


In this model, the processor executes a computation ia and 
after finishing, they synchronize and perform data exchange in 
a cycle. If during execution of an algorithm, all the processors 
are performing computations in all cycles then the system 
utilisation is 1. However, it is found that in some algorithm all 
the processors may not participate in computation in all the 
cycles, as some processor may be waiting for the results 
generated by some other processors. The value for such 
algorithms is less than one. 

The level of details required in the validation of a simulator 
should depend on how that simulator is to be used in decision 
making. If the performance measure thus obtained has some 
mean value (eg, CPU utilization, the average response time), 
then the notion of significance level and confidence interval 
should be applied to quantify the statistical significance of the 
difference between measured and simulated effects. The 
analysis of variance technique can also be used to test the 
hypothesis. 


CONCLUSION 

The model discussed here determines the performance of a 
static system. With some modifications, it can be made to 
work in dynamic environment also. The model discussed has 
got some limitations. Its advantage is that it helps smaller 
processes to complete execution by providing them time 
slices. In many cases the intermediate results provided by such 
processes is used by the other processes to continue execution. 
Since in most cases, parallel computers are used for similar 
kind of jobs repeatedly, by monitoring the communication 
pattern, the execution cycle can be varied to reduce the context 
switching overhead. 
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Editorial 


It is a matter of both honor and pleasure for us to put forth the ninth issue of 
BIJIT, the BVICAM's International Journal of Information Technology. It 
presents a compilation of eleven papers that span a broad variety of research 
topics in various emerging areas of Information Technology and Computer 
Science. Some application oriented papers, having novelty in application, have 
also been tncluded tn this issue, hoping that usage of these would further enrich 
the knowledge base and facilitate the overall economic growth. This issue shows 
our commitment in realizing our vision “to achieve a standard comparable to 


the best tn the field and finally become a symbol of quality’: 


As a matter of policy of the Journal, afl the manuscripts recetved and considered 
for the Journal by the editorial board are double blind peer reviewed 
independently by at-least two referees. Our panel of expert referees posses a 
sound academic background and have a rich publication record in various 
prestigious journals representing Universities, Research Laboratories and other 
institutions of repute, which, we intend to further augment from time to time. 
Finalizing the constitution of the panel of referees, for double blind peer 
review(s) of the considered manuscripts, was a painstaking process, but it helped 
us to ensure that the best of the considered manuscripts are showcased and that 
too after undergoing multiple cycles of review, as required. 


The eleven papers that were finally published-were chosen out of seventy nine 
papers that we received from all over the world for this issue. We understand 
that the confirmation of final acceptance, to the authors / contributors, 
sometime is delayed, but we also hope that you concur with us in the fact that 
quality review is a time taking process and is further delayed tf the reviewers 
are senior researchers in thetr respective fields and hence, are hard pressed for 
- time- iii 


We further take pride in informing our authors, contributors, subscribers and . 
reviewers that the journal has been indexed with some of the world’s leading 
indexing / bibliographic agencies like EBSCO (USA), Open J-Gate (USA), DOAJ 
(Sweden), Google Scholar, WorldCat (USA), Cabell’s Dtrectory of Computer 
Science and Business Information System (USA), Academic Journals Database, 
Open Science Directory, Indian Citation Index, etc. and listed in the libraries of 
the world’s leading Universities lke Stanford Untversity, Florida Institute of 
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Abstract - For production systems like expert systems, a rule 
generation software can facilitate the faster deployment. The 
software process model for rule generation using decision tree 
classifier refers to the various steps required to be executed 
for the development of a web based software model for 
decision rule generation. The Royce’s final waterfall model 
has been used in this paper to explain the software 
development process. The paper presents the specific output 
of various steps of modified waterfall model for decision rules 
generation. 


f 
Index Terms - Software Process model, Modified waterfall 
model, Decision Rule, Decision Tree 


1. INTRODUCTION 

Classification is the discovery of a predictive learning model 
that classifies a-data item into one of several predefined classes 
[4]. The classification model can predict the class of objects 
whose class label is unknown. It is also called classifier [8,16]. 
Patil et. al. have done work on fault classification of 
mechanical System using self organizing techniques [16]. 
Verma et. al. have used rough set techniques for 24 hour 
knowledge factory [17]. But none of these algorithms are 
available online. We have made attempt to develop a software 
process model for online rule generation. The model can be 
used by other researchers for their own algorithms to make 
online software. 

A decision tree is a classifier expressed as a recursive partition 
of the instance space. It consists of nodes that form a rooted 
tree. The leaf nodes denote class labels or class distribution. 
The non-leaf nodes denote a test on an attribute and branches 
denote outcome of the test [12]. An online rule generation 
software is required by researchers and ‘data mining personnel 
who have to generate rules to facilitate the development of 
expert systems or pattern ‘recognizing in various domains. 
Presently researchers use their individual program running on 
their desktop [18]. Online software enables the easy access to it 
using the default browser on the client machine. But it is not 
available yet. So there is a need to develop the online decision 
rule generation software referred to as ‘GenRule’. 

To build software, it is important to go through a series of 
predictable steps. The steps are like a roadmap that helps to 
develop a high quality system. This roadmap is also called a 
software process. > 
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The software development life cycle (SDLC) is the entire 
process of formal, logical steps taken to develop a software 
product [13]. There are five phases that are part of the SDLC 
[3]. These phases are requirements definition, design, coding, 
testing and maintenance. SDLC models are created based on 
the order in which they occur and the interaction between them 
[5]. The modified waterfall model developed by Royce has 
been used' in the development of GenRule. 

The present paper is an attempt to identify and document the 
requirements for developing the online software described 
above. The rest of the paper is organized as follows. The 
section 2 presents the software process model concepts. Its sub- 
sections deal with the various phases of SDLC namely 
requirement analysis, design, coding, testing and maintenance. 
Section 3 presents the conclusion. 


2. SOFTWARE PROCESS MODEL 
A software process model is an abstract representation of the - 
architecture, design or definition of the software process [14]. 
There are varieties of software development process models to 
show how organizing the process activities can make the 
development more effective [1]. One of the basic software 
process models is waterfall model. But it is not flexible. Its 
phases are strictly linear [9] So the Royce’s modified final 
waterfall model has been used in the development of the rule 
generation software using decision tree classifier. 

The advantage of the modified waterfall model is that it is a 
more relaxed approach to formal procedures, documents and 
reviews. It also reduces the huge bundle of documents. Due to 
this, more time can be devoted to coding without bothering 
about the procedures. This in turn helps to finish the product 
faster [9]. The different phases in the Royce’s final waterfall 
model [15] with reference to the development of GenRule are 
explained in the subsequent sub sections. 


2.1 Requirement Analysis 

Requirements are set of functionalities and constraints that end- 
user expects from the system. For GenRule, users are mainly 
developers of expert systems, students and data mining 
researchers who are interested in generating rules from data. 
Recently, expert systems are being developed for various 
agricultural crops like wheat, maize, mustard etc. In production 
systems like expert systems, knowledge is required to be fed in 
the form of rules. Usually these rules are made at the expense 


‘of valuable time of experts. In the field of agriculture, vast 


amounts of research data are generated every day and to 
convert those huge amounts of data to useful and 
knowledgeable decision rules that can help tn crucial decision 
making, decision rule generation software is i 

Consequently the experts could spend their time o pe 
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validating the generated rules that are provided along with 
accuracy measures. Rule generation using decision tree 
classifier is providing additional benefits of visualization effect 
in the form of decision tree, which adds to the understandability 
of rules. 

Information about user’s requirements was gathered from 
literature {11,7] and also by consulting the prospective users 
like researchers involved in development of expert systems. It 
helped to know what should be accomplished by the 
application. These requirements were further analyzed for their 
validity and the feasibility of incorporating them in GenRule 
were explored. 

User requirements are broadly categorized into two types 
namely functional and non-functional requirements. Functional 
requirements describe what the software should do. They 
include user requirements, input requirements, computational 
requirements, output requirements, exception handling etc. 
Non-functional requirements refer to the requirements that are 
not directly concerned with the specific functions delivered by 
the system [14]. They are broadly categorized into performance 
requirements and system requirements. The first one deals with 
the level of performance required by users. It includes various 
other requirements like usability, human factor and security 
issues etc. The system requirements for GenRule are identified 
at three levels namely, client level, server level and at the 
programmer level. 


2.1.1 Functional Requirements 

The sequence diagram facilitates the understanding of user 
requirements [14]. On the basis of interaction with various 
categories of users, sequence diagram for GenRule is presented 
in Fig 1. It explains the sequence of actions to be followed by 
user to generate rules using GenRule. The sequence diagram 
clearly exhibits the following functional requirements of the 
users. 

i. The user has to be validated by checking the user name and 
password. Only the valid users should have the facility to 
access the software. 

ii. Input Requirements: Facility should be provided to input the 
data in excel or CSV (Comma Separated Values) file format. 
The user has to enter the partition preference of the dataset and 
select the required attributes from the available attributes and 
finally the target attribute from the selected attributes. 

iii. Computational Requirements: 

(a)The input data should be validated for non-categorical and 
missing values. If they are present, exception will occur and 
error message will be displayed. There should be partition of 
input data into training and test dataset randomly according to 
the partition preferences given by the user. It should have the 
provision to select the required attributes from the whole set of 
attributes. It should also provide facility to the user to select the 
class attribute from the already selected list of attributes. 
(b)User should have the facility to classify future data instances 
if its classifier model is already built in the software. The user 
may be allowed to choose a classifier already built and stored 
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by GenRule. 1f the model is not learned, user has to generate 
the rules first and go for prediction of future instances. (c)Data 
flow model is an intuitive way of showing how data is 
processed by a system. The data flow diagram shown in 
Fig.2&3 illustrates how the data flows through a sequence of 
processing steps in GenRule to generate decision rules. The 
graphical user interface ts well explained with the help of use- 
case diagram (Fig. 4-7). Use-case diagram identifies the user 
interactions with the software. The various interactions 
involved are described in Fig.5, 6 and 7 respectively. The 
generation of rules from the training data using ID3 algorithm 
in the process 1.5 (Fig.2) is further explained using the data 
flow diagram in Fig 3. 

iv.Output Requirements: Facility should be provided to display 
the generated rules and the corresponding decision tree view 
along with evaluation measures like rule coverage, rule 
accuracy, precision, recall, F-measure, confusion matrix, 
training accuracy and test accuracy. Exporting and saving 
facility should be provided in Excel, text and XML file format 
for the generated rules for further implementation. For 
improving the understandability of the rules, the corresponding 
decision tree should be displayed and there should be provision 
to save it in XML format. 

v.On logging out, user should return to the home page. 


2.1.2 Non-Functional Requirements 

i. The software should be friendly and available on the internet, 
with the authentication of user name and password. 

ii. The software should meet all kinds of user requirements 
efficiently. 

lil. It should provide accurate output in all aspects. 


iv. Online help facility should be included. 


v. Results should be reliable. 

vi. Client level specification: Any browser with latest facility 
like IE6 or higher, Excel 2003 or 2007. 

vii. Server level specification: Windows 7, HS 7.0, Microsoft 
NET Framework Version 3.0, 2 GB RAM, 2.53 GHz 
Processor, 320 GB Hard Disk, 

viil.Programmer level specification: Microsoft Windows 7, 
Visual Studio 2008, IIS 7.0, 2 GB RAM, Excel 2007, IE8, 320 
GB Hard Disk. 

The [D3 algorithm is a recursive algorithm for building 
decision tree and decision rules [10]. As a part of requirement 
analysis, it is important to get an understanding of the basic [D3 
algorithm used. The 1D3 algorithm is explained using flow 
chart shown in Fig. 8. 


2.2 Design 

The requirement specifications from first phase were studied in 
this phase and system design was prepared. System design 
helped in specifying the hardware and system requirements and 
also helped in defining the overall system architecture. A set of 
software design concepts has evolved over the history of 
software engineering [14]. 
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Figure 1: Sequence diagram for rule generation 
The design of GenRule is presented with the help of input 
design, output design, database design and design of modules. 
Modularity is one of the powerful concepts for software design. 
Most complex design tasks are solved by breaking them down 
into manageable part called modules [2]. 

2.2.1 Input Data Design 

Data may be entered to the software using Excel or CSV file. 
the data, the columns should represent the attributes and th 
rows should contain the dataset instances. One of the attributes 
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should be the class attribute. Table! represent one such sample 


dataset where lodging is class attribute. 


n 












Table 1: Sample dataset 


Database Design 
Database for the system is maintained using MS SQL server. 
Database should contain a table that is useful to store the 





Table 2: Classifier table schema 
The database constructed corresponding to the sample input 
data given in the Table 1 is shown in Table 3. The last column 
denotes the name of the nodes. The id for each node is given in 
the first column. The parent _id column gives the parent node of 
each node. If parent id is zero, it is a root node. This 
classification table is useful to classify the unseen-cases. f 
| id _ | Parentis | Nodename | 
variety 
resistant 


Table 3: Classifier table for crop lodging dataset 
There is a database namely ‘aspnetdb’ that contains tables with 
the login details of users under the sql membership provider 
functionality of ASP.NET. 

2.2.2 Output Design 

Outputs for GenRule software are decision rules in table format 
and decision tree in' tree view format. It also computes various 
evaluation measures like confusion matrix, precision, recall, F- 
measure, training accuracy and test accuracy. The decision 
rules output table contains rule id and corresponding to each 
rule id\ the class attribute column and other attributes column 
with respective values for the rule. The value of ‘*’ given to 
attributes represents any value of the given attribute. Each rule 
is assodiated with its evaluation measures coverage and 
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accuracy, as coverage explains the data instances covered by a 
rule and accuracy explains the validity of a rule in the data 
instances. The coverage and accuracy of ith rule (Rule,) can be 
computed using the given formula. 


: (Rule) ncorers(Rutle,} 
Pa EE }tranmng dataset] 


ncorect(Rule ) 
Accuracy Rule ) = neovers( Eule,) 


Where ncovers (Rule) represents the number of data instances 
satisfying the antecedent of Rule, and neorrect (Rule,) 
represents the number of data instances correctly classified by 
Rule,.The decision rule output table of the data given in Table | 
should be like the Table 4. Rules should be displayed in classic 
if-then format also (Table 5). A confusion matrix contains 
information about actual and predicted classification done by a 
classification system [6]. The performance of such systems 18 
commonly evaluated using the data in the confusion matrix. 
Confusion matrix can be worked out for both training and test 
dataset. All the performance measures computed are functions 
of the confusion matrix (Table 6). Precision, recall and F- 
measure can be computed for the two class values. 


lodg | variet | Crop | clim | N_fert 
ing y Mis ate ilizer 
ation 


ad a E 
eee 
bla a: 
Bid eal ee CD 
Tepe pe e 


ca 














Decision Rule Rule 
Coverage 
C> 








PE E 
then lodging="yes’ 
If [variety='resistant’], 
[climate—dry'‘], then 
lodging='yes' 

If [variety='resistant’], 
[climaterainy'], then 






BIRA 


Copy Right © BIJIT — 2013; January — June, 2013; Vol. 5 No. 1; ISSN 0973 — 5658 

















Rule 
Coverage 
(%) 


Rule 
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If [variety='tolerant’], 14 
[N_fertilizerhigh'], 
then lodging="'no' 
5 If [variety='tolerant’], 21 
[N_fertilizer=low'], 
then lodging="yes' 





Table 5: classic if-then rule for crop lodging dataset 
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Table 6: performance measures on crop lodging classifier 


2.3 Coding 

The functionalities of GenRule can be achieved by developing 
various modules and assigning different tasks (Table 7). 

The implementation of the defined modules in the design phase 
was done using classes and methods (Table 8). 


2.4 Testing, Integration and Maintenance 

Each of the modules (Table 7) should be tested for their 
fimctionality individually during the unit testing for the desired 
output. These units should be integrated into a complete system 
during integration phase and should be tested to check if all 
modules/ units coordinate between each other and the system as 
a whole behaves as per the specifications. Bottom up 
integration was used for combining various modules [2]. 

The software has to be maintained as long as it is used for 
various applications. There should be proper Cece ne eer to 
facilitate the operati 












Login Provide facility of login to users 


An option for change of password 

Decision Rule | The main module, which generate decision 
rules from the training dataset along with 
rule evaluation measures like rule coverage 
and accuracy 


The module which validates the generated 
decision rules using the test dataset and |7 
provide evaluation measures 


Decision Tree | Constructs the decision tree corresponding 
to the generated rule set 









Generation 










Decision Rale 
Validation 
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Module Name 



















Provide the prediction of target attribute 
values for the future datasets which are 
unclassified. It is useful if the classifier is 
preexisting. | 
Provide the prediction of target attribute 
values for the future datasets which are 


Prediction 
unclassified, successively after building the 
classifier 


Sample Data | Provide sample data for the user 


Provide contact details of the development 











team 


o> Help | Provide online help about software 


Table 7: Description of various modules in GenRule 


present 


oming t i 
Table 8: Description of different classes in GenRule 


3. CONCLUSIONS AND FUTURE SCOPE 

Software process model explains the requirement 
specifications, input design, output design, database design, 
implementation, testing and maintenance phases. In this paper 
the software development life cycle of decision rule generation 
software is presented from its conception to its maintenance 
and implementation stage. The process model helped to 
develop GenRule software that can fulfill the requirements of 
agriculturalresearchers, teachers, students and other data 
mining personnels handling huge amounts of The model 
will also be helpful in future for any ancements or 
maintenance of the software. 

The software process explained above can be utilized for many 
other decision tree algorithms. 
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Abstract - Speech recognition is the ability of a machine or 
program to convert spoken words into tts equivalent text form. 
Nowadays, most recognition systems use Hidden Markov 
Modets for modeling the spoken utterances. In this paper we 
have implemented two speaker independent speech 
recognition systems which include all the words required for 
dialing a phone. The systems contain 42 words including 
digits from zero to nine and also include names of 20 persons. 
A total of-16,800 utterances, have been used for training each 
system. The two systems are able to recognize continuous 
speech and it is implemented with the help of monophones 
and triphones using HTK:, Experimental results show an 
accuracy of 74.11% for monophones based models and 
93.77% for triphones based models. 


Index Terms - HMM, HTK, Monophones, Triphones, Mel 
Frequency Cepstral Coefficient (MFCC). 

1. INTRODUCTION .~ ae 

Pattern recognition is an important area of maide learning 
domain. The domain of’ pattern recognition is itself quite wide 
and encompasses several! other interesting areas. The basic goa! 
of a pattern recognition problem is to be enable a machine to 
identify as to which class, arfong a set of given classes, does a 
test pattern belongs. One interesting application of this area is 
presented in [1] for generation of traffic models in urban areas. 
[2] presents an interesting research on the problem of face 
recognition, which is now-a-days widely used as a measure to 
authenticate the users. [3] presents a very nice review of the 
statistical pattern recognition methods. 

A subset of the pattern recognition domain is the area of speech 
recognition where spoken utterances are the patterns that are 
intended to be recognized. The process of speech recognition 
involves the communication between persons and machines 
where automata is generated to report the written equivalent of 
spoken words. From 1950’s researchers were trying to make a 
device that can recognize human voice. In 1952, at Bell 
Laboratories a system for isolated digits recognition was built 
by Davis Biddulph and Balashek. The system heavily relied on 
the spectral resonances of the vowels of each digit. After that, 
lot of work on em recognition has been done all over the 
world. 
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Some speech recognition systems give very good accuracy of 
more than 95% and are able to transcribe more than 150-160 
words per minute. The improvement in the speech recognition 
system is increasing rapidly day by day. Nowadays, many 
hand-held devices like mobile phones, iPods, iPhones are 
trying to provide a good recognition system and research is still 
going on to improve the quality of the recognition accuracy. In 
the present era, mainly Hidden Markov Models (HMM) based 
speech recognition systems are used. HMM is a doubly 
embedded stochastic process with an underlying stochastic 
process that is not directly observable but can be observed only 
through another set of stochastic processes that produce the 
sequence of observations [4]. HMMs were first discussed in 
the second half of 1960 in a series of statistical papers by 
Leonard E. Baum and his colleagues [5]. In 1970 it has been 
first used as a tool for speech recognition by Baker [6] at CMU 
and by Jelinek and his colleagues at IBM [7]. Since then, due to 
its strong mathematical structure it gained its popularity day by 
day and started to be used in a wide range of applications, such 
as handwriting recognition [8], natural language domain and 
also for forecasting stock prices for interrelated markets [9], 

etc. HMM can also be used for speech recognition in other 
languages. In 2006 Gupta made”an isolated word speech 
recognition for Hindi digits using continuous HMM [10]. Also 
in 2011 Kumar and Aggarwal made a Hindi recognition system 
using HTK which recognized 30 Hindi words [11]. In 2011 
Hguyen presented a paper which describes a study of building a 
Vietnamese speech recognition system using HTK. The system 
gives the accuracy of 71.37% for speaker independent 
recognition before speaker adaptation and 75.96% after speaker 
adaptation [12]. HTK has also been used for speech recognition 
for other international languages such as Arabic language [13]. 

In this paper, a speaker independent recognition” system is 
implemented with the help of Hidden Markov Model Toolkit 
which can be used to recognize continuous speech. The system 
includes all the commands required for dialing a phone. It 
consists of numbers from zero to nine and commands like 
“Call”, “Dial”, “Phone”, “Flash”, “Hangup”, “Hash”, “Star”, 
“Redial” and “Hold”. It also contains names of 20 persons. A 
total of 42 words have been used to make the system. The 
system is implemented using both monophones and triphones 
as base units. Experimental results show that the accuracy 
based on triphones models is much higher than the 
monophones based system. 


2. HIDDEN MARKOV MODEL TOOLKIT (HTK) 


HTK is a software toolkit for building and manipulating 
systems that use continuous density Hidden Markov models 
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(HMMs) [14]. It is a collection of library modules written in C 
combining which a system can be designed. The first version 
of the HTK was developed at the Speech: Vision and Robotics 
Group of the Cambridge University Engineering Department 
(CUED) in 1989 by Steve Young. ‘The tools provide 
sophisticated facilities for speech analysis, HMM training, 
testing and results analysis. The software supports HMMs 
using both continuous density mixture Gaussians and discrete 
distributions and can be usedto build complex HMM systems 
[15]. 

2.1 HTK Implementation Structure : ' 

The different steps for building the HMMs using the toolkit are 
detailed below: : 

Data Preparation: In this phase a database has been created by 
collecting data from 20 different speakers. Each has 20 
utterances of each word having a total of 16800 (20*42*20) 
utterances. The data is recorded using CSL workstation in a 
laboratory environment. A distance of approximately 5-10 cm 
is used between mouth of the speaker and microphone. Sounds 
are recorded at a sampling rate of 16000 Hz After recording 
has been done, all the words are manually labeled and stored 
with a logical name. | 

*Feature extraction: As it is very complex to work with raw 
speech data, it is important to extract all relevant acoustic 
information in-a compact form from raw speech. We use Mel 
frequency cepstral coefficients (MFCCs) [16] to extract feature 
vectors from the recorded raw data 

eModel Training: In this phase, the first thing required is to 
define a prototype model which contains the information about 
the characteristics and the topology of the HMM. For our 
system, the topology used is 3-state left-right with no skips 
[17]. With the help of this proto file we generate the first HMM 
and then repeatedly re-estimate it to get the required optimal 
model, 


3. IMPLEMENTATION DETAILS : 

First of all we make a grammar file which describes the words 
to be recognized. The grammar file for the telephone based 
system contains: 

$digit = ONE | TWO | THREE | FOUR | FIVE | SIX | SEVEN | 
EIGHT | NINE | ZERO; 

$name = RAM | JOHN | AMIT | HEMANT | BIKASH | 
GOPAL | ARUN | SUMIT | JAMES | NITIN | MAYANK | 
DEBANJAN | ROHIT | ANIL | RAJA | STEVE | JHONSON | 
KRISHNA | NIL | PUNIT ; 

$mode = ON | OFF; 

( SENT-START ( DIAL Sdigit Sdigit Sdigit Sdigit, Sdigit 
Sdigit Sdigit $digit Sdigit $digit | (PHONE | CALL) $name 
| SPEAKER $mode | FLASH | HANGUP | HASH | STAR | 
REDIAL | HOLD) SENT-END ) 

From this grammar file, some sample commands that can be 
formed are listed in Table 1: 

A total of 42 words have been selected to make the recognition 
system. The word CALL and PHONE can be used 
interchangeably. It is taken to give the user more flexibility 
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while calling. After making#the grammar it is saved in the 
gramfile. The symbol $ denotes a string variable, the vertical 
bars denotes alternatives and the angle braces denotes one or 
more representations. After making the grammar file we need 
to make the word network for these words. This is done by 
executing the command Hparse gram wdnet which will take 
gram file as input and generate the word network file wdnet 
that contains each word-to-word transition. 

Command 


CALL RAM (any name chosen from $name in | STAR 
the grammar file 


PHONE RAM (any-name chosen from $name 
in the grammar file 


REDIAL 


Table 1: Sample commands using the grammar file 
The next step is to build the list of phonemes for each of the 
words in the vocabulary. This is done by using the command 
HDMan -m -w wlist -n monophones|! -| dlog dict names which 
will take as input names and wlist and generate the fist of 
phonemes in the file monophones!.The wlist file contains the 
list of words and names file is same as wlist except that it also 
contains the phoneme sequences of the words. Table 6 contains 
all the words along with their corresponding phonemes. Now 
the silence sil is added to the list and saved in file monophone0. 
In addition SENT-END and SENT-START is augmented. 
In order to train the system with the given words, the list of 
words to be spoken for training is generated. It is generated by 
using the command HSgen -l -n k wdnetdict > trainprompts 
which will use wdnet and dict files and generate the train 
prompts that contain a total of k training sentences. Next, the 
recording of all these sentences are to be done using the 
software HSLab provided by the toolkit. 
Now for training the system it is required to replace the word in 
train.mlf file with its corresponding phonemes. This is done by 
executing the command HLEd -l ™' -d dict -i phonesO.mif 
mkphones0.led train.mlf which will take as input train.mIf file, 
dict file and mkphones0.led file and generate the corresponding 
phonemes in the file phonesO.mlf. The train.mlf file contains 
the trainprompts sentences and mkphones0.led file contains 
commands used to replace the word with its corresponding 
phonemes. 
The next.step is to parameterize the raw speech waveforms into 
sequences of feature vectors. This is done by the command 
HCopy -T 1 -C cfg_mfe -S code_mfc.scp. The command will 
take code mfc.scp and cfg_mfe file as input. The scp file 
contains the location of the .wav files and also the location of 
the .mfc files to be created. The configuration file cfg_mfc can 
be set as shown in Table 2 below: 










TARGETKIND MFCC_0_D_A 
TARGETRATE 100000.0 
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Table 2: Contents of the cfg mfe file 
Now the monophones HMM is generated by using following 


steps: 
For training the HMM, first of all a proto file is defined which 
defines the model topology. In our experiments the topology 
used is a 3-state left-right with no skips. The command 
HCompV -C config -f 0.01 -m -S tr_mfc_mono.scp -M hmm0 
proto is executed to generate a new version of file proto and 
Vfloor in hmm0 directory. The config file contains the only 
line TARGETKIND = MFCC 0 D A and tr_mfc_mono.scp 
file contain the locations of the MFC files. After that the 
model formed which is saved in the file proto is placed against 
each phoneme entry in hmmdefs file. Also copy the contents of 
vfloors to a file named macro. 
Then it is required to re-estimate the flat start monophones 
models and this can be done by executing the command 
HERest -C config -I phonesO.mlf -t 250.0 150.0 1000.0 -S 
tr mfc_mono.scp -H hmmO\macros -H hmm0\hmmdefs -M 
hmm1 monophones0O which will generate hmmdefs and macros 
files in hmm] directory. Executing the command two more 
times, the file hmmdefs and macros can be generated in hmm3 
directory. The previous step generates a 3 state left-to-right 
HMM for each phone and also a HMM for the silence model 
sil. Now we need to create a 1 state short pause sp model by 
copying the contents of the sil mode! and placing it in the sp 
model. Since sp has its emitting state tied to the center state of 
the silence model, the centre step is retained and other states 
are deleted. 
Now for making the model more robust, it is required to add an 
extra transition in the sil model which absorbs the various 
impulsive noises in the training data. This can be done by 
executing the command HHEd ~H hmm4\macros -H 
hmm4\hmmdefs -M hmmS sil.hed monophones! where the 
input files are monophones] and sil.hed. The sil.hed file 
contains data including: 
AT 2 40.2 {sil.transP} 

AT 42 0.2 {sil.transP} 

AT 1 3 0.3 {sp.transP} 

TI silst 

{sil.state[3],sp.state[2]} 
The AT command adds transitions to the given transition 
matrices and TI command creates a tied-state called slist. When 
we execute the command HHED, we get corresponding 
hmmdefs and macros files in the hmm5 directory. Finally, 
another two passes of HEREST are applied using the phone 
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transcriptions with sp models between words. This results the 
models to be stored in hmm7 directory. 

Since the dictionary contains multiple pronunciations of some 
words, so the phone models created so far can be used to 
realign the training data and create new transcriptions. This can 
be done by executing the command HVite -1 '*' -o SWT -b 
silence -C config -a -H hmm7/macros -H hmm7/hmmdefs -! 
aligned.mif -m -t 250.0 -y lab -I train.mif -S train.scp dict 
monophones! which uses the HMMs stored in hmm7 to 
transform the input word level transcription train.mlf to the 
new phone level transcription aligned.mif using the 
pronunciations stored in the dictionary dict. When the 
aligned.mif file is created, we execute another two passes of 
HERest which will store the required HMMs in hmm9 
directory. 

Now we are ready to run the recognizer for live input. For this, 
a configuration file config2 is needed which will convert the 
input data into its parameterization form. The config? file 
contains the following parameters and their values: 


SOURCERATE 625.0 
SOURCEKIND 















Table 3: Contents of the config? file 
Now for recognizing the word, the command HVite -H 
hmm9/macros -H hmm9/hmmdefs -C config2 -w wdnet -p 0.0 - 
s 5.0 dict monophones1 is used which uses a token passing 
algorithm to perform viterbi-based speech recognition.The 
Viterbi algorithm finds the best state sequence for the 
observation sequence obtained from the previous steps. It takes 
wdnet, dict, monophonesland a set of HMMS as input. It 
converts the word network to a phone network and then attach 
the appropriate HMM definition to each phone instance. When 
we run the command, it first measures the speech and 
background silence level by prompting the user to speak an 
arbitrary sentence. After that it will repeatedly recognize the 
word and output into the terminal, 
The triphones based HMMs are generated with the help of the 
following additional steps: 
First the command HLEd -n triphones] -i ™' -i wintri.m! 
fmktri.led aligned.mlf is executed which will convert the 
monophone transcriptions in aligned.mlf to an equivalent set of 
triphone transcriptions in wintri.mlf. Also a list of triphones is 
saved in triphones! file. The mktri.led file is an edit script. 
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gram HPARSE wdnet 

names, dlog, 

wlistdict, HDMAN 

monophones 1 

wdnet, 

dicttrainprompts HSGEN 

speaker HSGEN way, lab file 
dict, train, 

iphones > pens 
code_mfc 

cfg _mfcmfce fites HCOPY 

config, protoproto, 

tr mfc_monovitoe® HCOMPV 

config, phones0, macros, 

tr_mfc_mono, hmmdefs 

macros, hmmdefs, 

monophones0 

config, phones0, HHED macros, 

sil, macros HEREST hmmdefs, 

hmmdefs, aligned 

monophones! 
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for triphonesfor monophanes 






config, aligned, 
tr_mfc_mono, 
macros, hmmdefs, 
monophones 1 


HEREST 


mktri.led,  wintri 
aligned 


mktri.hed, 
monophones1, 
macros, hmmdefs 


output 


hmmdefs, 

monophones I Ariphones| 
Figure 1: Block diagram representing the working of 
monophones and triphones-based recognition system. 


which contains WB sp, WB sil and TC where WB commands 
define sp and sil as word boundary symbols. Now we have to 
make an edit script mktri.hed containing a clone command CL 
followed by TI commands to tie all the transition matrices in 
each triphone set. Now the cloning of models can be done by 
using the command HHEd -B -H hmm9/macros -H 
hmm9/hmmdefs -M hmm10 mktri-hed monophones]. Finally, 
another three passes of command HERest -B -C config -l 
wintri.mlf -t 250.0 150.0 1000.0 -s stats -S train.scp -H 
hmm] 1/macros -H hmm11/hmmdefs -M hmm12 triphones1 are 
applied to save the resultant models in hmm 13 directory 

For live recognition, we can use the command: 


HVite -H hmm13/macros -H hmm13/hmmdefs -C config2 -w 
wdnet -p 0.0 -s 5.0 dict triphones] 


The complete training and recognition process is explained 
with the help of a block diagram in Figure 3. 


4. EXPERIMENTAL RESULTS 

To find the accuracy of the system, 20 speakers have been 
selected to test the system. From these 20 speakers, 10 speakers 
are those whose voices have been already included while 
making the model and 10 new speakers are included to test the 
system. Each person speaks 20 utterances of each word 
resulting in a total of 400 (20*20) utterances per person. The 
system is tested for both monophones and triphones based 
models and the results are shown in Table 4 below: 


Recognition accuracy (in 
percentage 
Monophones | Triphones 
based based 

HMMs HMMs 
77.25 94.25 
74.5 95.5 
71.5 9 
76.25 
80.25 


> 
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CALL 
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A 
E 
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E 
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E 
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JOHN 





RUN 
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Recognition accuracy (in 
percentage 
= Words Monophones | Triphones 
` based based 
HMMs HMMs 
MAYANK 72.25 
2 |N | O9 | 2 | 
2 nme | ws l a 
31 [RAM | 6925 | %4 | 
e 
e a a a 
ss tsi asso 


40 |THREE | 7675 | 9425 
Table 4: Word recognition performance using Monophones 
and Triphones HMMs. 


Table 5 below shows the confusion matrix of the mostly mis- 
recognized words and also for the word “six” which gives good 
recognition result. 

j 


| ZERO 
nee 
ROHIT JOHN 
oe eee 
GOPAL ROHIT 
475 67 
sumrr aa e 


KRISHNA Aa NITIN 


aue aag 333) 
ee 


Table 5: Words that are confused with other words. 


bad 
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5. ANALYSIS AND DISCUSSIONS 

From the above Table 4 it has been observed that for 
monophones based system, the word hash and flash gives low 
recognition score. This may be due to the fact that the 
pronunciation of both the words is very similar to each other. 
Also the words two, Rohit, Punit and Sumit gives poor 
results. From this we can infer that the words containing the 
dental sound /t/ can confuse the system. Sometimes the pruning 
of the words during labeling is not done properly and some part 
close to the boundary of the words gets removed. As a result of 
this, the models formed by that data is not properly trained and 
it gives low recognition score. So while cutting the word for 
labeling it is necessary to leave some silence region before and 
after each word in the data preparation stage. 

The two Figures 2 and 3 shown below depicts the spectrogram 
of worst recognized word ‘TWO’ and the best recognized word 
‘SIX’ found in the course of the experiments. It is clear from 
the plots that though the formants are well marked and steady 
for “two”, there is less variations that can be captured and 
modeled by our system. On the other hand the spectrogram plot 
for “six” clearly shows numerous acoustic changes in the 
formant structure during the utterance. We can infer that this 
property of modeling the transitions is well captured by the 
triphones based HMMs.Though some earlier experiments show 
poor result for the word “six”, in our experiment “six” gives 
very good result. This may be due to the good recording quality 
or may be the spotting and pruning of word is proper for “six”. 


Pung RAAI 
r 1 mi 


=A 


a i, LM i PA Hi. j 
ath a m ay 





Spectrogram of “Two”. Spectrogram of “Six”. 

For triphones based recognition system it is observed that all 
the words give reasonably good result as compared to 
monophones based system. The monophones based recognition 
system gives accuracy of 74.11% while triphones based 
recognition system gives 93.77%. So, from the experiment we 
can clearly say that the triphones model gives much better 
results than the monophones-based model. 


“6. CONCLUSIONS AND FUTURE WORK 


In this paper two telephone-based recognition systems are 
developed with the help of monophones and tri-phones using 
the Hidden Markov Toolkit (HTK). The system is able to 
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recognize the words spoken by speakers inside and outside the 
set for both continuous and isolated words. The whole 
experiment has been carried out in a normal room environment. 
The system gives good accuracy of 74.11% for monophones 
and 93.77% for triphones based models. The triphones based 
models perform far better than the monophones based models. 
Work is now underway to semi-automate the generation of the 
training models to deploy a speech recognition system at a 
short notice. The authors are also planning to update the 
computation of the feature vectors in Acopy to include new 
acoustic-phonetic features within the toolkit. 
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PHONEMES [PA 
: 
BIKASH bih k ax SH : 





iti 






: 
(HASH | bhaesh (| hm 
[HOLD ‘| hhowld | fhould” 
EET 
w č — [anm — [av 
NNE | mayn [mo 
(NIN | nihtihn | Amt 
[OFF è [oe ooo l U 
TON sid ohn oe 
FONE id wabn | Awan? 
[PHONE [| fown | /fom/ 
aan 
TRAM | raem + ann 
[SEVEN | sehvn | seven? 


THREE [thry | At 


Table 6: The phonetic breakup and IPA 
representation of all the words. 
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Abstract - In the current scenario, most of the applications 
are based upon graphical user interface and dependent upon 
the object-oriented technology. Software Industries are 
interested to convert old structured based softwares into 
object-oriented based softwares and also to reduce the lines of 
the code of application for reduction in the execution time of 
application. Therefore, it is a big challenge to reduce the 
execution time of the application based upon the object- 
oriented technology. The present work deals with the 
reduction of execution time for the superscalar machine by 
the use of object-oriented approach. A well known modeling 
language ie. Unified Modeling Language (UML) is used to 
model the superscalar pipeline architecture. UML class and 
sequence models are designed before computations of the 
execution time and computed results are depicted in the form 
of tables and graphs. The comparisons are also made by 
taking the two object-oriented programming languages. 


Index Terms - Superscalar pipeline architecture, performance 
evaluation, class model, sequence medot and unified 
modeling language. 


1. INTRODUCTION 

Pipelining is one of the important techniques which have been 
implemented to improve the performance of a processor. It 
allows the concurrent execution of several instructions. A task 
or program or process is divided into sequence of subtasks and 
each task is executed by a specialized hardware stage which 
operates concurrently with other stage in pipeline. There are 
several categories of pipeline like arithmetic pipeline, 
instruction pipeline, memory access pipeline and superscalar 
pipeline. Superscalar pipeline architecture can start two or more 
instructions in parallel in one core, and independent 
instructions may get executed out-of-order. For parallelism, 
scalabilityand programmability, [1] is an important release 
which describes these aspects with increasing system resources 
and accordingly to parallel, vector and scalar instructions. 
Mano [2] describes the computer organization and design as 
well as programming using basic components. 

Patterson and Hennessey [3] covers the most fundamental! areas 
of computer architecture including recent technologies, like 
multicores and multiprocessors. 

The depth treatment with the implemented details of pipelined 
processors and memory systems; the “micro architecture” of 
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the modern computers and microprocessors by exploring the 
techniques for solving design problems inherent in computers 
with high level concurrency as the demand for a memory 
system with low latency and high bandwidth are described by 
Cragon and Saini [4,5]. 

Unified Modeling Language (UML) is a general purpose 
modeling language which is used to model various kinds of the 
research problem widely accepted by the software professionals 
and created by Object Management Group (OMG [6,7] and 
development stages are well explained by Booch et al. [8]. The 
fundamentals of UML using hands-on projects, drills and 
mastery checks which illustrates how to read, draw, and use 
this visual modeling language to create clear and effective 
blueprints for software development projects are explained by 
Roff [9]. UML is also used to model the concurrent distributed 
and real time applications which help the researchers to 
leverage the powerful flexibility and reliability of the system. 
UML also helps the designers at every stage of the analysis and 
design process and offers exceptional insight into dynamic 
modeling, concurrency and distributed applications designing 
and performance analysis of real time designs [10]. By using 
distributed computing, the performance of processors for 
different object-oriented software system framework has been 
measured by Saxena et al. [11]. They have chosen two types of 
object-oriented software system frameworks C# based on 
Microsoft.NET framework and Visual C++ based on Microsoft 
Foundation Classes (MFC) and computed the performance of 
these two object-onented languages. The UML modeling for 
instruction pipeline design by two techniques i.e. data 
forwarding and without data forwarding are explained by 
Saxena and Raj [12]. The modeling and specification of 
floating point numbers are implemented by Boldo et al. [13]. It 
extends an existing tool for the verification of C programs, with 
the new notations specific to the floating point arithmetic. It 
also provides a way to perform the full formal proof by use of 
COQ proof assistant and an open framework which is 
implemented to other floating point models. But the main 
limitation is that it is applicable only to programs using basic. 
The IEEE standard is the most widely used standard for 
floating point and arithmetic representation. It is implemented 
on most of Central Processing Units (CPU's) and Floating 
Points Units (FPU’s); explained with basic and extended 
floating point number formats, operations such as add, 
multiply, divide, square root, etc. It is also used to implement 
the conversion between integer and floating point formats, but, 
it does not specify the decimal strings and integers, 
interpretation of NAN’s and conversion of binary to decimal to 
and from extended format [14]. Saxena and Shrivastava [15] 
have attempted to increase the performance of arithmetic 
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pipeline especially for floating point computations after 
designing the complete UML model of static arithmetic 
pipeline design. They presented UML diagrams to model the 
architecture and timing behavior. Saxena and 
Shrivastava [16] also presented floating point computations by 
using nonlinear arithmetic pipelining for instruction coupled on 
Visual C++ and Visual C#. The computations are performed 
inside a loop by varying the number of repetition of terms for 
getting their sum. In the current scenario, distributed computing 
approach is most popular approach and in this regards, a 
comparative study of the distributed computing paradigms is 
presented in [17]. Quality of services is one of the major issue 
for the distributed computing applications and these are 
described by Mohan et al. [18] for the process centric 
development. The design patterns for the service oriented 
architecture implementation are described by Tere and 
Jhadav[19]. 
In the present work, UML is used to model class and sequence 
diagrams for superscalar pipeline architecture which can 
execute two or more instructions in parallel and authors 
evaluated the performance of the two object-oriented languages 
like Visual C++ and Visual C# and some of the important 
observations are recorded in the form of table and graphs. 


2. BACKGROUND 

2.1 Process Definition 

Let us first explain the process which is considered as a 
program which is to be executed. It can be defined as a unit of 
work in modern time sharing systems. For defining the process 
processing element is needed to be defined as stereotype and is 
used to handle some modeling elements based on UML base 
classes. A UML Class for process is shown in figure] and is 
identified by its own identification number represented as 
Process-id. The other attributes are Process-size for the size of 
a process; Process_in_time and Process out_ time are for start 
at out time of the process. The attribute Process_priority 
controls the priority of the incoming process. These attributes 
work on the operations like Process _create(), Process_delete, 
Process. update, Process join, Process suspend, and 
Process synchronize. The visibility modes along attribute and 
operation are also shown in the figure. A stereotype processing 
unit is also depicted in figure 2, the instance and multiple 
instances of class Process, class are shown in Figures 3(a) and 
3(b), respectively. The process may consist of segments of code 
whose identification numbers are generated; recorded into a list 
and granted processing unit as per the priority of that segment 
code behaving as a process. The segments may be synchronized 
with the processing unit as per the time of completion of that 
segment; therefore, multiple instances of a process are shown in 
figure 3(b). 


2.2 Thread 

. A thread is defined to control a block of code that runs 
concurrently with other threads within same process. It is a, 
sequential flow of instructions and it is considered as 
lightweight process. It is easily handled in object-oriented way. 
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Threads run simultaneously in process and can access the same 
object to implement their functionality. 


<<processing unit>> 
Process 


+Process_id: integer 
+Process_size: integer 
+Process_in_time: string 
+Process out_time: string 
+Process_ priority: integer 


+Process_create{) 
+Process deleteQ 


+Process_update{) 
+Process_joinQ) 
+Process_suspend() 
+Process synchronize() 





Figure 1: UML class diagram of a process 


<<Stereotype>> 


processing unit 


Process id: integer 
Process_type: string 
Process_cardianality: integer 





Figure 2: UML class for processing unit 


= 


3(a) Xb) 
Figure 3a: Single instance, Figure 3b: Multiple instances 


In the current scenario, most of the window based applications 
are based upon the thread concept as system supports 
synchronization of sub tasks of a process. Threads are 
initialized and after the use these are automatically destroyed, 
therefore, it has a life cycle. Object-oriented representation of 
thread is shown below in figure 4, in which it is identified by 
an atttibute called as Thread _id. The other attrihutes associated 
with thread and thread operations are also shown below in the 
figure 4. 
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+Thread_id: integer 


+Thread size: integer 
+Thread_name: string 
+Thread_prionty: integer 





Figure 4: UML Class Diagram of a Thread 


23 Superscalar Processor Architecture 

Superscalar processor architecture has a versatile design with 
the two pipelines and it can issue the instructions per cycle, if 
there is no resource conflict and no data dependence problem. 
Both pipelines have four processing stages namely fetch, 
decode, execute and store. 

Each pipeline has its own fetch, decode, execute and store unit. 
The two store units can be dynamically used by the two 
pipelines, depending upon its availability at particular cycle. It 
has four functional unit adder, multiplier, logic and load unit. 
These all functional units are shared by pipelines on dynamic 
basis. There is a lookahead window with its own fetch and 
decoding logic. Lookahead window is used in case of out of 
order instruction to achieve better pipeline throughput. 


3. UML MODELING FOR SUPER SCALAR 
PIPELINE DESIGN 
3.1 UML Class Diagram 
The figureS shows the architectural model of superscalar 
processor. The class process interacts directly with PEC which 
executes the assigned task. The PEC controlled the process by 
exchanging message between classes processor and memory. 
The processor class has two cores i.e. Corel and Core2 and 
each core has many components which help in process 
execution as shown in figure. 
In this figure, class L2_ cache is shared by two cores and caches 
instruction through the class I Cache whereas D_cache caches 
the data, which itself is subclass of L1_cache. The class ALU 
computes integer arithmetic and logical operations; FPU is 
used for floating point operations as shown in figure. SU is 
used for storing the outputs. FPU class contains four classes as 
namely Adder, Multiplier, Logic and Load unit. 


3.2 UML Sequence Diagram 
The UML Sequence diagram represents the dynamic behavior 
of system in which objects are interacted with the help of 
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message communications. The vertical line shows the life line 
of object or dynamic representation of system, a UML 
sequence diagram is shown in figure 6 for process execution in 
Superscalar pipeline architecture. 





' Figure 5: UML class diagram for superscalar process 


The processor executes the instructions fastly through 
execution pipelining, which execute multiple instructions at 
same time. The instruction fetched, decoded and finally goes to 
PEC where instructions executed and results store in 
Registerfile and then Wniteback. 
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4. EXPERIMENTAL STUDY 

On the basis of above object-oriented design, let us consider the 
two object oriented languages i.e. VC++- and VC# which work 
on the .Net platform. For these two programming languages, a 
relative performance of the superscalar processor is computed. 
In the computation, let us consider N are the independent 
instructions which can be executed in parallel through pipeline 
method and k is taken as the time required to execute 
instructions through m pipeline simultaneously, then the ideal 
time required by scalar base machine is 


C1 Lr Ne erties (i) 
The ideal execution time is computed by 
T (m, 1) =k + (N-myYm..............08 (il) 


The computations for ideal execution time are done by taking 
lines of code varying from 10? to 10° and these instructions are 
considered by increasing the size of loop. Execution time is 
computed by taking average of five runs and results are 
depicted in the tablel. As expected lines of code are increasing, 
execution time is also increasing but if one compares VC++ 
andVC#, for long computations in milliseconds, it is observed 
that VC-++ takes lesser time in computation than VC#. These 
results are also graphically represented infigures 7 and 8 given 
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on next page for 107, 10° and 10*,10° lines of code (LOC), 
respectively. 


5. CONCLUSION 

From the above work, it is concluded that UML is powerful 
modeling language accepted by software Professionals and also 
used to represent hardware architecture problems. For the long 
computations, software professionals are facing the problems 
for selection of best object oriented Programming language 
which works well on any kinds of processor architecture. 
Therefore, superscalar processor architecture is modeled by the 
use of UML classes and experimental results are performed by 
taking two object-oriented programming language like Visual 
C++ and Visual C# and concluded that Visual C++ is better in 
comparison to Visual C# as one is performing the long 
computations. 


REFERENCES 

[1]. Hwang, K, Advanced Computer Architecture: 
Parallelism, Scalability, Programmability, Fourteenth 
Reprint, Tata McGraw-Hill Edition, ISBN-0-07-053070- 
K-2007. 

[2]. Mano Morris, M., Computer System Architecture, Third 
Edition, Prentice Hall of India Pvt Ltd. ISBN-978-81- 
203-0855-8, 2007. 

[3]. Patterson, A. David and Hennessy, L. John, Computer 
Organization and Design: The Hardware/Software 
Interface, Morgan Kaufmann Publishers Elsevier Inc., 
2005. 

[4]. Cragon, G. Harvey, Memory Systems and Pipelined 
Processors, Narosa Publishing House, New Delhi, 1998. 

[S]. Saini, A., “Design of the Intel Pentium TM Processor’, 
Intel Corporation, IEEE, and Available: http://ieee 
xpiore.icee.org/stamp/stamp.jsp?arnumber=00393370 


(Accessed on 14% March 2012). 

[6]. OMG (2001),“Unified Modeling Language 
Specification”, Available online via 
http://www.omg.org. 

[7]. OMG (2002), “XML Metadata Interchange (XML) 
Specification”, Available online via http//www. 
Omg.Org. 


[8]. Booch, G., Rambaugh, J., and Jacobson, I., The Unified 
Modeling Language User Guide, Twelfth Indian Reprint 
Pearson Education, 2004. 

[9]. Roff, T., UML: A Beginner’s Guide, Tata McGraw-Hill 

Edition. Fifth Reprint, 2006. 

Gomaa, H., “Designing Concurrent, Distributed and 

Real Time Applications with UML”, Proceedings of the 

23rd International Conference on Software Engineering 

(ICSE’01), IEEE Computer Society, 2001/ 

Saxena, V. Arora, D., and Ahmad, S.; “Object Oriented 

Distributed Architecture System through UML”, IEEE 

International conference on Advanced in Computer 

Vision and Information Technology (ACVIT-07), Nov. 

28-30, ISBN 978-8 1-89866-74-7, pp.305-3 10,2007. 


[10]. 


[11]. 


522 


Performance Evaluation of Superscalar Processor Architecture Through UML 


[12]. 


[13]. 


[14]. 


[15]. 


[16]. 


[17]. 


[18]. 


[19]. 


Copy Right © BIJIT — 2013; January — June, 2013; Vol. 5 No. 1; ISSN 0973 ~ 5658 


Saxena, V. and Raj, D., “UML Modeling for Instruction 
pipeline Design”, World Conference on Science, Eng- 
ineering and Technology (WCSET,2008 ), www .waset. 
org/ pwaset (Acessed on 16 NOV,2011). 

Boldo, S. and Filliatre, J.C., “Formal Verification of 
Floating Point Programs”, 8th IEEE Symposium on 
Computer Ari-thmetic (ARITH ’07), pp.187-194 
Availabale: http :/ / www. computer.org (Accessed on 16 
Nov, 2011). 

Lopez, G., Taufer, M., and Teller, PJ., “Evaluation of 
IEEE 754 Floating-Point Arithmetic Compliance across 
a wide range of Heterogeneous Computers”, 
Proceedings of the 2007 Richard Tapia Celebration of 
Diversity in Computing Conference, October 2007 , 
Orlando, Flor-ida »USA. 
Available:http://gcl.cis.udel.edu/publication/ 
conferences/ 007tapia_mlopez.pdf (Accessed on 16 Nov, 
2011). 

Saxena, V. and Shrivastava, M. “UML Modeling of 
Static Arithmetic Pipeline Design”, The ICFAI 
University Press Vol. 7(1), pp.22-31, February 2009. 
Saxena, V. and Shrivastava, M., “Performance 
Evaluation of Non-Linear Pipeline through UML”, 
International Journal of Computer and Electrical 


` Engineering, Vol.2, No.5, pp.860-866, October, 2010. 


Kumar, H: and Verma, A.K., “Comparative study of 
Distributed Computing Paradigms”, BIJIT - BVICAM’s 
International Journal of Information Technology, Vol. 
1(2), Dec. 2009. 

Mohan, K.K., Srividya,A., Verma, A.K. and Gedela, 
RK., “Process Centric development to Improve Qos in 
Building Distributed Applications”, BUTT — BVICAM’s 
International Journal of Information Technology, Vol. 
1(1), July, 2009. 

Tere, G.M. and Jhadav, B.T., “Design Patterns for 
successful Service Oriented Architecture 
Implementation”, BUIT — BVICAM’s International 
Joumal of Information Technology, Vol. 2(2), Dec. 
2010. 








r 
AE 5 
4 ta 
s* 7. 
po s Tæ 
= = 
š 
pa 
$, 
-$a * 








S a 

a ‘Rae: 3 

F au 

a“ oui i 

E ae , ave +e | 
ao a | 

: an 

tl 20 


} 
H 
| 
| 
wn | 
| 
| 
| 
| 
| 


Number of Computations 





Figure T Comparisons for 10 and 10° Lines of Code 


* <53 

Tes EO | 
~ 

+. 


7000 “ae ra pi. ee 
ine ELAS fife: 7 BES, n? E 


Ta Me ony 


ka. 
de a ities is as asec 


ate 
r = 
PE TY NE TP Cle O ten 


Execudem Taactn ai receads) 


miam rer AAAA STITUTE WORT TONY RAMBO GRE no rst eee Ana 


Number of Computations 


Figure 8: Comparisons for 10° and 10° Lines of Code 


523 


BITIT - BVICAM’s International Journal of Information Technology 


VCH VCH 


92.005 889.0005 | 8952.00005 | 14.05 | 108.005 9233.00005 
Execution 
109.005 889.0005 | 8967.00005 | 14.05 | 108.005 936.0005 9170.00005 
108.005 874.0005 | 8780.00005 | 14.05 | 108.005 9264.00005 
93.005 890.0005 | 8796.00005 | 14.05 | 108.005 920.0005 9249.00005 
108.005 8812.00005 | 14.05 936.0005 9280.00005 


102.005 889.4005 | 8861.40005 | 14.05 926.40005 | 9239.20005 


Table 1: Ideal execution time for superscalar processor 
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Abstract - Resource optimization, with advance computing 
tools, improves the efficient use of energy resources. The 
renewable energy resources are instantaneous and needs to 
be conserve at the same time. To optimize real time process, 
the complex design, includes plan of resources and control 
for effective utilization. The advances in information 
communication technology tools enables data formatting and 
analysis results in optimization of use the renewable 
resources for sustainable energy solution on smart grid. 

The paper presents energy computing models for optimally 
allocating different types of renewable in the distribution 
system so as to minimize energy loss. The proposed energy 
computing model optimizes the integration of renewable 
energy resources with technical and financial feasibility. An 
econometric model Identifies the potential of renewable 
energy sources, mapping them for computational analysis, 
which enables the study to forecast the demand and supply 
scenario, The enriched database on renewable sources and 
Government policies customize delivery model for potential to 
transcend the costs vs. benefits barrier. The simulation and 
modeling techniques have overtaken the drawbacks of 
traditional information and communication technology (ICT) 
in tackling the new challenges in maximizing the benefits 
with smart hybrid grid. Data management has to start at the 
initial reception of the energy source data, reviewing it for 
events that should trigger alarms into outage management 
systems and other real-time systems such as portfolio 
management of a virtual hybrid power plant operator. The 
paper highlighted two renewable source, solar and wind, for 
the study in this paper, which can extend to other renewable 
sources. 


Index Terms - Energy Computation, Energy Mapping, 
Techno-Economical feasibility of Renewable Energy, 
Renewable energy model, Energy Efficiency 


i. INTRODUCTION 

“Supervisory Control and Data Acquisition (SCADA) systems 
for control on hybrid sources of energy have two components: 
Energy Management Systems (EMS) and Distribution 
Management Systems (DMS). A hybrid EMS/DMS system 
requires higher level security analysis functions such as state 
estimation and contingency analysis for EMS and feeder 
voltage and loss optimization for DMS systems. 


L 2 IDDC, Indian Institute of Technology (IT) Delhi, New 
Delhi 
E-mail: 'rajeshkr38@nic.n and ’agarwala@iddc.iitd.ernet.in 
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Energy Distributive system model for adequate accurate 
predictive analysis plays important role for sustainable country 
energy resources, in consideration of all influential factors in 
energy generation and distribution. For prediction purposes the 
important parameters are geographical location, seasonal 
influence, effect of climate change and state or area concession. 
Renewable energy potential for mitigation action on climate 
change reported by IPCC Special Report on Renewable Energy 
Sources and Climate Change Mitigation run on four models 
namely, IEA-WEO2009-Baseline, ReMIND-RECIPE, 
MiniCAM-EMF22, ER-2010 for potential scenarios [1]. With 
the abundant data and relative economic indicator, energy 
prediction is performed with close loop predictive system based 
on a timing algorithm. The energy economics in free trade 
market like India, where the peak load varies abruptly due to 
season and community demand, its hourly prediction model is 
more useful. 
The energy prediction model proposed in this paper has a large 
scope to take on innovative role for country’s growth in the 
energy security regime. Scientists are working on commercial 
application of energy modeling. The computation and mapping 
tool is unique in that city planners and government to integrate 
renewable energy on the grid. The tool is helpful for planning 
new substations and infrastructure in the ever-growing city. 
The study optimizes the integration of the various renewable 
energy resources with financial feasibility. The model 
overcomes the constraints like hourly available sources, the 
voltage limits, the feeders’ capacity, and the discrete size of 
the available distributive generation and distribution units. 
ms paper addresses the following issues: 
Power grid planners need to account for the impacts 
brought by different kinds of energy sources like power 
factor, hybrid energy voltage, load management programs, 
energy efficiency, high renewable energy penetrations, and 
energy storage. 
= Evaluation of the cost/benefit of the different technologics. 
= Setup a planning tool to rum a base case and a comparable 
case that has a new technology implemented. 
= Generate a cost effective/optimal expansion plan. 
The paper is organized to present the modeling approach in 
computing complex energy scenario of demand and supply. 
The paper has computation and algorithms for modeling results 
of solar and wind renewable resources for conclusion and 
recommendation, 


2. REVIEW OF ENERGY MODELS 

Energy is a vital input for social and economic development of 
the community and the state. In technology driven economies 
the demand for energy in agricultural, industrial and domestic 
activities has increased remarkably, especially in emergent 
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countries, which also increases greenhouse gases. The cost 
economics of energy forces the use of renewable energy 
sources more effectively, i.e. energy which comes from natural 
resources and is also naturally replenished. The dependability 
of renewable energy resources on the climate enhance the need 
for complex design, planning and control optimization 
methods [7]. 

Power system planning involves planning of generation, 
transmission, and distribution systems. Generation planning 
begins with mid-term (months to several years) and long-term 
(several years to 10 years) load forecasts because generation 
expansion often requires 2 to 10 years to complete. When load 
forecast is available, reliability evaluations will be the next step 
to assess where and when to install the new generation. Finally, 
economic evaluations are performed to determine the optimal 
generation expansion planning. Accurate load forecast leads to 
an economical capacity expansion plan that meets reliability 
requirements [10]. 
Leading vendors of power system planning tools are: Multi 
Area Production Simulation Software program (MAPS) from 
General Electric (GE), Plexos for Power Systems from Energy 
Exemplar, GridView from ABB, and PROMOD from 
Ventyx [20]. 

National Instruments Labview and the Labview Control Design 
and Simulation Module can be used to-simulate a full wind 
turbine system, including the turbine, mechanical drive train, 
generator, power grid and controller. AROMA model method 
has been employed in a predictive m I]. 

HOMER is a computer model developed by the U.S. National 
Renewable Energy Laboratory (NREL) to assist in the design 
of micro-power systems. HOMER finds the feasibility of the 
system by assessing whether it can adequately serve the electric 
and thermal loads through an hourly time series simulation over 
one year. It also estimates the life-cycle cost of the system, 
which is the total net present cost of installing and operating the 
system over its lifetime.[6] 

The RET Screen Plus Performance Analysis Module can be 
used woridwide to monitor, analyse, and report key energy 
performance data to facility operators, managers and senior 
decision-makers, 

The MARKAL model uses an integrated energy system 
optimization framework that enables policymakers and 
researchers to examine the best technological options for each 
stage of energy processing, conversion, and use. This modeling 
framework was used to represent a detailed technological 
database for the Indian energy sector with regard to energy 
resources (indigenous extraction, imports, and conversion) as 
well as energy use across the five major end-use sectors 
(agricultural, commercial, residential, transport, and 
industrial)[6]. 

LINGO is a comprehensive tool designed to make building 
designs. It can solve Linear, Nonlinear (convex & 
nonconvex/Global), Quadratic, Quadratically Constrained, 
Second Order Cone, Stochastic, and Integer optimization 
models efficiently. [26] 
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Renewable energy potential for mitigation action on climate 
change reported by IPCC Special Report on Renewable Energy 
Sources and Climate Change Mitigation run on four models 
namely, IEA-WE02009-Baseline, ReMIND-RECIPE, 
MiniCAM-EMF22, ER-2010 for potential scenarios. There is 
enormous variation in the detail and structure of the models 


used to construct the scenarios. Many authors have, in the past, 


attempted to categorize models as either bottom-up or top- 
down [1]. 

These models have constraints including hourly available 
sources, the voltage limits, the feeders’ capacity, the maximum 
penetration limit, and the discrete size of the available DG 
units, with in the legal constraints applicable [14]. 


3. ENERGY COMPUTATION 
MODEL 
Accurate predictive analysis influential factors in energy 
application have been taken into consideration during design of 
energy model in this paper[20]. Taking the collected abundant 
data and related economic indicator model is accepted by 
international trade standards as the base, further strict 
calculation can lead to relative economic indicators. For 
prediction, full consideration of seasonal influence on 
renewable energy application must be considered with a 
closed-loop predictive system formed based on timing 
algorithm, to make the predictive model able to provide perfect 
prediction in the light of varied data [4][5]. 
The computing and mapping is addressed to the energy demand 
and the potential energy resources. Available data are collected 
based on a particular sampling procedure on field works and 
survey in 2007 and continued in 2010. This mapping is 
expected capable of informing accurate data about renewable 
energy diversification distribution all over the province. The 
global data sets and analytical tools at National Renewable 
Energy Laboratory (NREL) and for India specific at Centre for 
wind Energy Technology (C-WET) and Indian Metrology 
Department (IMD) permit modeling of wind and solar radiation 
resource predictions [19]. 
The proposed model for energy computing and mapping is 
extensions of AROMA model in Indian conditions. ARIMA 
prediction algorithms model by Peng Chen et-al provide a 
reliable base for popularization of renewable energy source 
application in building construction, key technologies, which 
include the multi-level system framework, functional modules, 
database design [11]. The present study is focused on two types 
of renewable i.e. Solar and Wind [14]. 
The proposed method has been employed in predictive 
model have higher accuracy of time sequence. In this 
study, the auto regressive integrating moving average 
model, will be study and analyse for the adoption within 
considered condition [9]. Then predictive monitor will 
done by employing model, plus comparison of predictive 
monitoring results and historical data, so as to achieve 
even better predictive monitoring results. The formula 
used in ARIMA model is described as the following, 


AND MAPPING 
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i Xia Wr Agr TET + Wp Xrpt art Gy ayit EET + Qaqata DESE (1) 

which wh... Wp is the autoregressive coefficient, p is 
autoregressive order, 0: ... 6q is moving average 
coefficient, q is moving average order, { a ...}is noise 
sequence, X is the original data sequence, y; is a 
stationary sequence formed through d times differential 
[11]. 


3.1 Solar Energy Mapping 

Solar radiation assessment stations provide measurements of 

global solar radiation available and this methodglogy is called 

directly and for locations where the data was not available, 

indirect methods were used [19]. The indirect methods are as 

follows; 

= From extra-terrestrial radiation, allowing for its depletion 
by absorption and scattering by atmospheric gases, dusts, 
aerosols and clouds. This is theoretically based and 
requires some approximation of the absorbing and the 
scattering property of the atmosphere. 

= From other meteorological elements, such as duration of 
sunshine and cloudiness using regression technique. This 
method is empirical based, and the form usually used 
involves actual and potential hours of sunshine, which 
gives the regression constants for global and diffused solar 
radiation at the particular location or site. 

The solar energy data is collected, documented and analysed by 

Ministry of New & Renewable Energy (MNRE) and Indian 

Metrological Department, MNRE has published the solar 

radiation potential map for India [8]. The solar energy is 

converted into useful energy with two techniques explained 

here. 


A) Photovoltaic Power 

Solar energy photovoltaic power is the direct solar energy 
utilization form with non-pollution, effective and easy power 
generation which can be either independent running or parallel 
running. The independent running of solar energy photovoltaic 
power generation system requires battery as the energy storage 
device, chiefly adopted in remote areas without power grid and 
dispersedly populated areas. But, the whole system is rather 
costly. In areas where power grid is available, the parallel 
running shall not only lower down the cost greatly, but also 
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highly efficient with a friendly environment features. 

Systematically collect the global solar radiation, the solar 

radiation capacity and the parameter of effective radiation 

surface area of the solar cell array for evaluation of solar 

energy photovoltaic efficiency applied in buildings, the 

economic indicators shall be calculated as follows: 

1, The global solar radiation Ip obtained from the surface of 
the solar cell array 

2. The energy In the form electrical voltage and current 
produced by solar cell array is Py 

3. The inverter loss during conversion to usable energy L 

4. Substituted quantity Spy of conventional energy power 
conversion 

As the important data of evaluating solar energy photovoltaic 

power efficiency applied in buildings, based on the above-listed 

economic indicators, to obtain the solar energy photovoltaic 

power model economic indicators assemble solar energy 

photovoltaic power 

Barv { In, Py, L, Spv } "ae (2) 

The solar photovoltaic (PY) market saw another year of 

extraordinary growth. Almost 30 GW of new solar PV capacity 

came into operation worldwide in 2011, increasing the global 

total by 74% to almost 70 GW as shown in figure | [16]. 


B) Solar Thermal 

The solar thermal system is another form of solar energy 
utilization. The system is to collect solar radiation energy 
through a device named heat arrester to heat exchanger. Such 
installation is presently the most economical and technically 
mature product which is already commercialized (3][21]. While 
evaluating efficiency of the solar energy arrester, the following 
five economic indicators shall be considered: 


1. Solar energy assurance factor ọ 

2. Solar energy heat collecting system efficiency n, 

3. Heat exchanger efficiency mh 

4. Useful heat quantity of solar heat collecting system 
Qur 

5. Substitution quantity of conventional energy sources 
Spy 

Based of the above-listed five economic indicators, the 


assemble indicator of solar water heating is thus obtained as 
Eemi { P, Th, Ths Qur» Sev } —- (3) 
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Figure 2: Concentrate Solar Thermal Power 2011 


The concentrating solar thermal power (CSP) market continued 
its steady growth in 2011. More than 450 MW of CSP was 
installed, increasing total global capacity by 35% to nearly 
1,760 MW [16]. The market was down relative to 2010, but 
significant capacity was under construction n at year’s end. 
Over the five-year period of 2006-2011, total global capacity 
grew at an average annual rate of almost 37%. (See Figure 2.) 


3.2 Wind Energy Computation 

Wind energy potential is calculated based on the wind data on 
annual average wind speed. Annual average wind velocity data 
for wind-monitoring stations across Indian states are collected 
by the India Meteorological Department (IMD). To analyze 
variations across seasons, data was grouped season wise as 
summer (February—May), monsoon (June—September)and 
winter (October—January). Season wise wind velocity and 
standard deviation are computed for wind-monitoring stations. 
GIS is used for mapping wind resources spatially and to 
quantify and analyse temporal changes. Based on these, GIS 
thematic layers are generated, which would help in assessing 
the variability. The map helps to identify the most and the least 
suitable potential areas for harnessing wind energy. 

The wind turbines power curve is defined as the power output 
of the machine as a function of wind speed. The behavior of the 
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output power of the machine is generally dependent on four 
characteristic parameters. It is assumed that power generation 
starts at the cut-in wind speed Vc (m/s), that the output power 
Increases as the wind speed increases from to the rated wind 
speed Vp (m/s), and that a constant value of the output power, 
namely the rated power Pr (kW), is produced when the wind 
speed varies from Vp to the cut-out wind speed Vp (m/s), which 
is the maximum wind speed value at which the turbine can 
correctly work. 

The linear wind model assumes a linear (affine) dependence 
(within the interval [Vc & Va]) of the wind turbine power 
output, P’ , on the current wind speed at the hub height V‘ 


As t0 ,..... T-1, being T the time horizon in hours. In detail: 
{ Q y'< Yc 
{  Pp(atdVv’') Ves V'S Vp 
{ 
P={ Pp Ve SV! <Vp 
{ 0 VOVe 
T=, ...., T-1 (2) 


It should Be observed that wind speed V‘ in (2) is that 
corresponding to the wind turbine hub height, Hwb, Since, in 
general, wind speed data can be measured or forecasted with 
réference to a height H,,,that is different from the hub height, it 
is necessary to use an equation relating the wind speed at hub 


528 


Survey of Energy Computing in the Smart Grid Domam 


height with the wind speed V's at Hu , taking into account 
the surface roughness length, which is a parameter that can be 
estimated on the basis of the land use at the wind farm location. 
Ew{ Pr, Vc, V' , Vr, Hw } (3) 

During 2011, an estimated 40 GW of wind power capacity was 
put into operation, more than any other renewable technology, 
increasing global wind capacity by y 20% to approximately 238 
GW as shown in figure 3. 

Around 50 countries added capacity during 2011 and enhanced 
power capacity more than 10 MW in 68 countries and out of 
these 22 have cross 1 GW capacity. The top 10 countries 
account for nearly 87% of total capacity. Over the period from 
end-2006 to end-2011, annua! growth rates of cumulative wind 


power capacity averaged 26% [16]. 





Figure 3: Wind Power installed 


4. ECONOMICS OF ENERGY COMPUTATION 
To achieve the accurate predictive analysis concerned 
influential factors in energy application have been considered 
during design of energy model in this article. The collected 
abundant data and economic indicator models by computation 
model obtain best solution for energy economics. A closed- 
loop predictive system is formed based on timing algorithm, to 
make the predictive model able to provide accurate prediction 
in the light of varied data [18][22][23]. 
To sum up above, taking the three types of energy sources as 
examples, in considering the platform needs to predict, statistic 
and analysis of the data, the overall model of energy is 
designed as shown in fig 3. The proposed computing models 
have been designed to target the following requirements of the 
Distributive Generation System (DGS) [2][24][25]: 
= Analyze the situation and decide the data collection 
strategy and methodology on new and renewable sources. 
Collect and collate the relevant data required for modeling. 
= Apply conceptual modeling for the design of integrated 
system like input on energy sources for the design of 
hybrid power plant to exploit maximum renewable energy 
sources at reasonable price. 
= Either apply proposed models or in addition develop 
mathematical models for simulating environmental impact. 
= Generate different scenarios ultimately to arrive at 
effective environment management plan with a view to 
support the decision makers. 
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The Contro! Design and Simulation Module (CDSM) provides 
a numerical simulation environment that-erabfes users to test 
the model. CDMS is used to analyse the interactions between 
hybrid power solution comprises of mechanical-electrical 
systems [25] . Furthermore, the quality of existing models can 
be improved and other control strategies can be investigated by 
simulating deep-bar induction generators and more complex 


models of drive trains [15]. 
Overall Energy Model 





conomic Indicators 
Analvsis 





Figure 4: Integrated Energy Model 


5. CONCLUSION 

The computing proposes to develop algorithmic formulas for 
diversified renewable energy sources and building integrated 
projects. The proposed platform will also be able to conduct 
predictive analysis on the vast accumulated historical data, to 
aid finalization of the energy resource that is most 
economically and efficient. Furthermore, a statistical and 
analytical function is envisaged for this platform which can 
make comparative display of the same indicators of different 
projects or different indicators of the same project, hence 
providing a basis for popularization of renewable energy saving 
in different areas. 
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Abstract - As the world around is going through a 
technological revolution with the dawn of digital age, 
educationists are in some ways compelled to rethink the 
existing education system and its components. With the tools 
and the techniques available, nowadays it’s imperative to 
reconsider how they can be used to improve educational 
institutions and associated bodies. Opportunities for 
knowledge discovery in educational data have increased 
tremendously with digital revolution now as compared to the 
scenario in the past. Educational data is becoming 
increasingly rich as more and more educational systems are 
going online and collecting large amounts of data. In this 
paper a study of an enrollment dataset is presented. 


Index Terms - Data Analytics, Educational Data Mining, 
Enrollment data, Adaptive Educational Hypermedia. 


1. INTRODUCTION 

In this new digital age, the world of education has also gone 
under a major transformation. The new technologies and 
gadgets available help not only enrich and enhance the existing 
education system but also offer new opportunities and modes 
which can take the process of learning beyond institutions and 
allow people to learn on their own time and own terms. These 
new advances in learning have played a big role in this age of 
knowledge enhancement via different means and are clearly a 
sign that there is a need to rethink how the technology potential 
can be tapped to improve our education system [1, 2]. As of 
now most of the changes can be seen in the way information is 
stored, retrieved, distributed or provided to the students such as 
educational technology, e-learning portals, Leaming 
Management Systems used in distance learning, blended 
learning & so on. Another emerging and associated area is 
educational data mining (EDM) where storage, retrieval and 
analysis of educational data sets can be leveraged to 
revolutionize education systems [3]. 

People in all fields and disciplines are becoming more and 
more informed. They are learning to observe, collect and 
interpret data trends around them to make better and informed 
decisions [4, 5]. Analysis of educational data sets is required to 
understand needs of current society and then also cut down 
costs in the process [4, 6]. In order to clearly define the 
framework and the needs for revolutionizing the current 
education system data have to be analyzed as a first step. In this 
paper such a study & some solutions are described. 

7 School of Computer and Information Sciences, Indira 
Gandhi National Open University, Delhi — 110068, India 
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2. EDUCATIONAL DATA MINING 

In the last few years EDM has emerged as a field of its own. 
The EDM community website [7] defines EDM as follows: 
“EDM is an emerging discipline, concerned with developing 
methods for exploring the unique types of data that come from 
educational settings, and using those methods to better 
understand students, and the settings which they learn in.” Data 
mining (DM), or Knowledge Discovery in Databases (KDD), is 
the field of discovering novel and potentially useful 
information from large amounts of data [8]. It finds 
applications in the fields of artificial intelligence, numeric & 
combinatorial optimization, business, management, medicine, 
computer science, engineering etc. [9]. DM largely consists of 
analyzing available sets of data to interpret, isolate the trends 
and patterns present in the data i.e. converting raw data into 
information. The trends obtained can be called as prediction or 
recommendations [10]. These can be used by educators, 
educational software developers, teachers, parents or students. 
However, it is largely understood that EDM methods are often 
different from standard DM methods. This is because of the 
non-independence and multilevel hierarchy found in 
educational data. For the same reason, it is increasingly 
common to see the psychometrics models being used in EDM 
[11]. DM is a part of Data Analysis. The outcomes of data 
based research can be descriptive or actionable, this study 
includes both. 

DM can be visualized as a confluence of multiple disciplines 
where the background knowledge pertaining to the area of 
study is processed using tools pertaining to other disciplines 
such as — information science, database technology, statistics, 
machine learning & other related fields. Here the ‘Area of 
Study’ would be ‘Education’. The data can be collected from 
students’ use of interactive learning environments, computer- 
supported collaborative learning, evaluation, assessment or 
administrative data (web logs, library usage) from schools and 
universities. There are various challenges in the field of 
education like understanding choice of major, appropriate 
evaluation schemes, student drop out, retention, student unrest 
and crime, assessment of institution and educationists’ goals 
like quality, access, cost, social and cultural biases. Educational 
efficacy can be measured and predicted using DM methods 
[12]. DM is a field which has originated from databases and 
Artificial Intelligence [13]. Understanding the current trends of 
our education system could point out towards the underlying 
issues and help us device an effective plan to address them. 
Figure 1 shows broad two possible dimensions of EDM 
research wherein utilizing the data from point of view of 
educators and also from those studying the 
management/administrative aspect of EDM is considered, 
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Figure 1: Two possible dimensions of Educational Data 
Mining research. 


2.1 EDM for Educators 
An example for such type of EDM is PSLC (Pittsburgh Science 
of Learning Centre) [14, 15]. 


2.2 EDM for Administrator and Managers 

Education Administrators also use EDM to understand more 
management related factors such as demographics, enrollment 
etc. An example of EDM for Admin is the presented case study 
of enrollment data consisting of 3020 record obtained from 
SRD (Student Registration Division) of IGNOU (Indira Gandhi 
National Open University). The data files are in dbf format 
_(figure 2) and can be imported in MS-Excel (figure 3) using 
FoxPro. In dbf format, one column is shown in an entire screen 
in figure 2 whereas figure 3 now has a more readable tabular 
representation. 





Figure X .dbf data file in MS-Visual Studio’s Visual 
FoxPro (raw data file as received) 
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Figure 3: .xlsx data file in MS-Excel (cleaned data file) 
For this research, enrollment data of disabled student for an 
entire year 2009 was obtained from them in January 2010. 
After data cleaning (using pivot table) some interesting patterns 
were obtained. The graphs obtained from this analysis àre 
shown below and discussed in the next section. The research 
methodology has been followed from [16]. A wide variety of 
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DM methods are available such as prediction, clustering, 
relationship mining, discovery with models, and distillation of 
data to obtain and present knowledge [17, 18]. Some relevant 
studies can be found in [19, 20 & 21]. 


3. STRATEGY FOR DATA CLEANING 

Data parsing [22] is easy after importing file from FoxPro to 
MS-Excel because every column can be viewed separately 
now. The values are standardized already and discrete 
verifiable from university website & prospectus. Records were 
matched to see that there is no repetition of a student’s 
enrollment number which is the primary key. Necessary 
transformation can be done e.g. to get age from date of birth. 
So, overall MS-Excel turned out to be a good tool for data 
cleaning. 

‘Cold start’ is when a data miner has to start from scratch or 
‘zero’ as in this study. Typical real world data sets which are 
unformatted (raw) need to go through data cleaning steps [22] 
to be successfully used in a study. After formatting the data 
appropriately, pivot table feature in excel (a statistical tool in 
MS-Office package) was used for Data Cleaning. Suppose the 
variable under consideration is ‘State’. Various possible 
occurrences of State ‘Delhi’ can be counted as in figure 4. 
Blanks and wrong fields also got marked (‘Del’, ‘Dilli’). This 
also explains that why sometimes local understanding of the 
database can be crucial. 


>. a i ; Á -AÈ 
Figure4: Pivot table used to count variables& detect blanks 
and wrong code. 


4. RESULTS 

In this section some of the results are presented as obtained 
using pivot table feature of MS-EXCEL, while doing the data 
analysis conducted on disabled students of IGNOU who 
enrolled for various courses in the year 2009. 

4.1 54.11% students are of young age group (figure 5). 
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Figure 5: Graph showing age group distribution of students 
where last column represents ‘Wrong code’. 


4.2 37.11% students enrolled for Master of Political Sciences 
and paid an average fee (figure 6). 





Figure 6: Distribution of courses/programs opted for by the 
students. 


4,3 68. 01% opted for English Medium (figure 7). 





Figure 7: Distribution of medium of instruction as opted by 
students. 


4.4 More than 70% students are male (figure 8). 
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Figure 8: Pie chart showing distribution of gender across 
student population. 


4.5 Most students had finished their previous educational 
qualification within the past decade as indicated in figure 9 
below. 


so BO. 1940 __ 1950 1960 1970 190 190 2000_ 20102070 
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Figure 9: Distribution of students according to the year in 
which they finished their previous educational 
qualification. 


4.6 54.4% of students are unemployed and 28.1% are employed 
by IGNOU itself. This shows that number of students who 
are pursuing education while being employed elsewhere is 
only 15.3% (figure 10). 
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Figurel0: Distribution of students accoring to employment 
status 
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4.7 Analysis of territory code of students address shows that 
60% of students are from urban areas (figure 11). 





Figure 11: Distribution of student population as per 
territory code . 


4.8 To o the accessibility, we analyzed the email ID 
field which showed that more than 71% students don’t have 
email ids Agurs 12). 





Figure 12: No e-mail ids indicate lack of access to internet 
and technology. 


4. DISCUSSION WITH POSSIBLE ACTIONS 

Above results indicate knowledge divide and digital divide. 

Better methods of increasing outreach are required. 

5.1 Result 1 is self-explanatory. Figure 5 has an approximate 
shape of a normal distribution [23] as often exhibited in 
biological data [24]. This also resonates well with the 
model of our current education system where most people 
like to study or focus on their career value addition in their 
twenties or early thirties. This kurtosis curve is skewed to 
the left and a bit slant on the right. 

5.2 Result 2 is due to the fact that these students find it easy to 
do humanities or social sciences courses because there is 
no help in the form of artificial limbs & training to use 
them in laboratories (sciences). Science laboratories have 
no accessibility equipment or area. 

5.3 English medium books are comparatively easily available 
in India at higher education level. IGNOU however plans 
to launch courses in regional languages. More steps that 
can be taken are — to encourage translation of books/texts 
in all subjects and to make them accessible ~ brail 
translation, audio books (record and release), video field 
tours and online repositories of all these educational 
media. 
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5.4 Result 4 shows that disability is more common in males in 
this data sets. But raises more questions about infant & 
child mortality rate, gender biases or gender divide. 

5.5 Result 5 ts self-evident. Students are in their 20s, so they 
have mostly passed out in recent past. Figure 9 is shape of 
a chi-square distribution [23, 24], with one outlier - 
‘current’ year pass outs. 

5.6 Result 6 requires action from Governments to create 
accessible jobs to increase employment. 

5.7 Result 7 indicates the possibility of urban area students 
having better accessibility to these courses i.e. Knowledge 
Divide. This needs to be verified by designing a focused 
future study for the same and improving awareness and 
accessibility in other areas as well. 

5.8 No e-mail ids indicate lack of access to internet and 
technology for disabled students i.e. Digital Divide. 


6. ACCESSIBILITY AND TRACKING i 
There is need to improve content delivery. It may help in 
decreasing digital and knowledge gaps. Currently quite a few e- 
learning and online information delivery platforms are 
designed with a “One-size-fits-all” approach. Existing distance 
education system lacks interactivity and can lead to lack of 
motivation and interest. There is a need for flexible education 
systems which can also provide guidance as per capacity & 
learning level [25]. 

Adaptive Educational Hypermedia (AEH) are flexible and 
customizable to provide appropriate lesson for each student. 
Here various views of the same material are created, as desired 
by the user This can be done by maintaining a student 
enrollment database combined with user behavior database 
using tools like link removal (figure 13), stretch text (figure 14) 
and course monitor (figure 15) for all those students of the 
university who are using online resources. These proposed 
tools utilize options set by user & also track and record actions 
of the user, which media type is chosen most often etc. 
Combined database form a user model [26] or a student model 
- goal, previous knowledge, previous performance, 
background, experience, preferences, stereotypes, user-supplied 
preferences supplied at run time, analysis of user actions & 
plan recognition or inference. Providing varying views of the 
same content is a paradigm shift away from “write once, use 
once” towards a middleware system [26]. 


Adaptive Hypenne cha Software On 


(contem) 





Figure 13: Link removal too! [27] 
This link removal tool saves the time spent on looking for most 
preferred type of media by avoiding confusion. Same is the 
purpose of stretch text tool and it also makes the interface user 
friendly. Such tools adapt to the habits & needs of a user 
(HumanComputer Interface i.e. HCD. 
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Figure 14: Stretch text tool [28] 


Relations in the concept ensure that the student has a study 
guideline to follow and clearly knows the perquisites and the 
predecessors for each study. If certain specific perquisites are 
not fulfilled, the learner will be prompted for the same by 
course monitor tool. This tool is in accordance with Skinnerian 
or Linear Approach and can be combined with Programmed 
Learning. A rule for this can be as below. More components 
can be added — interest, repetition. 

If cl.access = true then set c2.allow_access = true else 
c2.allow_access = false 

If (cl.access = true or cl.test_passed = true) then set 
c2.allow_access = true else c2.allow_access = false (to include 
assessment & evaluation options) 

_ Concepts from various disciplines can be combined — “many to 
_ many” approach. 





Figure 15: Course monitor tool [29] 
7. SCALING IN EDM 
A Data Mining problem can be solved through Generalization. 
To achieve a high degree and accuracy of generalization, a data 
miner needs a large number of records and more resources in 
terms of access, time, permissions, teams [30], better software, 
machines and related facilities as shown in figure 6. Over a 
longer period of time Agility in Data Analysis can be achieved 
through Software Re-Engineering. Real world implementations 
have high complexities [31]. Using tools like WEKA for data 
mining can give meaningless or useless results. In between 
steps are not shown as in a calculator so, a required level of 
understanding may not be obtained. Tools like SPSS require 
higher system configurations which may not be available to a 
researcher, 
8. CONCLUSION AND FUTURE SCOPE 
Analysis of educational data was discussed from the approach 
of administration. Understanding of the variables provided, 
exhibited digital divide and knowledge divide. AEH can be 
used to improve content delivery and may help in decreasing 
digital and knowledge gaps.It was observed that for developing 
EDM models, the data obtained should be focused, well 
organized to achieve effectiveness. At IGNOU where the 


Copy Right © BIJIT — 2013; January — June, 2013; Vol. 5 No. 1; ISSN 0973 — 5658 


dataset was collected from an administrative focus and without 

a preexisting problem statement, more data collection & further 

studies besed on them are required to predict the trends. 

Studying educational data sets can aid in suggesting pedagogies 

(teaching methods), site modification, intelligence services & 
age recommendations in the long run [22, 32]. 





Educational 
Data Ming 


(Goals) 


Figure 16: Team work in EDM 

To improve the analysis and usability of enrollment data, the 
enrollment form at universities should be improvised to collect 
relevant/targeted data fields [33] such as degree of disability, 
monthly/annual income of the family and other personal data of 
family, individual/student and assessment. Online enrollment 
can facilitate collection of data for analysis. Such e-forms can 
be made adaptive in nature. Background information and 
performance of every pupil can be assessed throughout the 
academics and the employment/career to clearly identify any 
patterns and correlations in the data. 

Another possible study can be to analyze the assessment data 
[34, 35] & log files of these students to find non-independence 
and multilevel hierarchies in educational data [36]. Such 
analysis can help us provide more useful insights in the 
education system of IGNOU for disabled students and to 
understand the factors affecting students’ learning and career 
path development. 
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Abstract - Knowledge representation is base for expressing 
semantic content of input in intelligent information retrieval 
systems. Identification of semantic requires processing of 
input language at various levels. To make system understand 
text or speech is a challenging task as it involves extracting 
semantics of the language which itself is a complex problem. 


At the same time languages posses with multiple ambiguities 
and uncertainty which needs to be resolved at various phases 
of language processing. Level of understandability depends 
upon the grammar, syntactic and semantic representation of 
the language and methods employed for these analysis. 
Processing depends on the type of language, grammar of the 
language, ambiguities present and size of corpus available. 
Order free language posses different features as compared to 
rigid order language. Most of the Indian languages are order 
free; hence mechanism for such language needs to be 
formulated. One of the ancient Indian Sanskrit grammarians, 
pAninl has defined grammar of Sanskrit language in such a 
way that it ts suitable for computational analysis. Six main 
semantic class identified under this theory is a baseline model 
for knowledge representation. This paper exploits the features 
of the language, applicability of rules and resolving 
ambiguities using neural network model. A hybrid model 
incorporating Ñe features of rules based and neural network 
the is designed and implemented for pAninl based semantic 
analysis, generating case frames as output, 


Index Terms - pAninI Grammar framework, Knowledge 
Representation, Case Frame, Natural Language Processing, 
Seneantic. 


1. INTRODUCTION 

Knowledge representation is a technique to represent the 
meaningful and logical content embedded in the language; in a 
structured form. Development of such tool requires an 
exhaustive analysis of mput language at syntactic and semantic 
level with capacity to handle ambiguities at each level. Natural 
languages are not so natural for computer processing; hence a 
. KR tool acts as bridge between the natural language and 
understanding of language by machine. Development of such 
tool is heavily guided by language processing techniques and 
type of language. Order free language posses different 
characteristics than rigid order language. As most of the Indian 
languages are order free, they require different mechanism to 
handle their processing. KR, Natural Language Processing 
(NLP) and Information Retrieval (IR) are close module of such 
applications as depicted in Figure 1. 
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Figure 1: Inter relation between NLP and KR 
Statistical methods are applied for syntactic analysis of Indian 
language with Hidden Markov Model(HMM) [12], support 
Vector machine (SVM) being popular statistical Technique [4] 
[19]. Application of Neural Network for classification task is 
less observed as both are complex domain. This paper presents a 
method for generation of Case Frames (CF) as KR structure for 
Sanskrit Language under pAnin/ framework. Method identifies 
semantic role of each word with respect to action or verb present 
in the sentence, there by presenting a verb-argument relation. 
Six main semantic classes are defined under pAnin/ framework. 
Identification and classification of word into one of the class is 
achieved by analyzing suffix attached to word. Identified class 
along with word is stored in KR structure called CF. However 
while performing the classification one suffix may map into 
multiple domain resulting into conflicting output. Such conflict 
is resolved by training neural network for ambiguous cases. Non 
conflicting cases are handled by one-to-one vibhakti_kArka 
mapping resulting into a hybrid model for case frame 
generation. This paper describes the concept of pAnin/ grammar 
for semantic analysis, database of suffix, algorithm and 
solutions for conflict cases. KR based system are widely used in 
applications like translation system, learning algorithm and 
question answer based system 





2. pAninI GRAMMAR 

One of the ancient languages of the world, Sanskrit, has well 
defined grammatical and morphological structure which 
precisely defines the relation of suffix-affix of the word with 
the syntactic and semantic classification of the sentence [2][3] 
[11]). Such analysis leads to development of KR structure. For 
order free language like Sanskrit, processing is quite interesting 
as suffix based analysis reveals syntacto-semantic features of 
the sentence. Sanskrit is analyzed from computational 
perspective on vedic text [7] as well as capability of pAnin/ 
grammar is equivalent to finite state machine [8]. Development 
of automatic segmentiser is an effort in this field [13]. Hindi and 
Arabic clauses are also analysed from pAninian aspect [14]. 
Parallelism of pAnin/ in field of computer science is well 
explained [15]. Rule based POS tagger developed at JNU,Delhi 
uses lexicon and displays all possible outcome for conflicting 
cases [9]. This paper explains processing of Sanskrit for 
classifying words in one of six semantic roles defined by pAnini 
under kAraka theory implementing a novel approach —Neural 
Network. 
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Generally, dictionary of words is maintained and each word is - 


mapped to find its respective syntactic category. As pAninI has 
identified the syntacto-semantic information of the word by the 
suffix attached to the word, instead of maintaining dictionary of 
words, lexicon of suffix is sufficient for extracting features. 
kAraka roles are similar to case based semantics required for 
event-driven situations, where entities like agent , object, 
location are identified with respect to each event [6] [10]. 
pAninl, an ancient Sanskrit grammarian has given nearly 4000 
rules called sutra to describe behavior of the language in the 
book called asthadhyAyi; meaning eight chapters [10]. Ancient 
old kAraka theory rules are in parallel with finite state machine 
[8] and concept is being extended for English language [20]. It 
describes transformational grammar which applies sequence of 
rules to transform root word to number of dictionary words. 
From small set of root words, millions of words are generated 
by firing set of rules. For highly inflectional language like 
Sanskrit, sequence of declension tables are memorized in such a 
way that similar ending words follow the same declension. 
Hence, if one table is memorized, number of words can be 
generated if their base word falls under same group. This 
structural representation in optimum form is used to identify the 
semantic class of the word. In Sanskrit language, fundamental 
six roles, given by påninI as kAraka values, are key semantic 
component of a sentence as described in Table 1. 





Table 1: Six kAraka in pAnanian model. 


Suffix driven analysis is performed by mapping the suffix to 
database which contains suffix and a key number. Key is 
designed in such a way that it contains all the syntactic 
information as per grammar of the language 


3. KEY NUMBER DESIGN FOR SUFFIX 

All the nouns in the language follow nominal declension tables 
for each category of word. For example all ‘a’ ending word 
foHow the declension given in Table 2 with word as ‘rAyna’ and 
Table 3 shows the suffix attached to the word. 
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m 


Am AbhyAm 


rAmAbhyAm | rAmebhyH 
rAmAbhyAm | rAmebhyH 









6 Asya oH Arm | 
Table 3: Suffix for all a ending word 
Each row corresponds to a vibhakti value and column represents 
the number or vachan. Unlike English language, which 
contains only singular and plural, Sanskrit has singular as 
ekvachan , plural as bahuvachan and two in number is labeled 
as dwivachan. Vibhakti is related to kArka values, Suffixes 
present in first row or vibhakti is kartA kAraka (agent). 
Likewise each row represents a kArka role as given in the Table 
3.1.Sixth wibhaki is not included in Table 3.1 as it is sambandh 
kAraka which has relation with its immediate ent and not 
related to verb directly, hence not considered as AAraka by 
pAninl. 
Four digit key number schemes for noun suffix is designed as 
given in Fig 2 


Figure 2: Four digit number scheme for noun 

Type of ending is in first column where 9 different type of 
ending is considered with values in range 1-9.Gender are 
masculine, feminine and neuter with values 1,2 and 3. Seven 
vibhakti from 1 to 7 and three number from Ito 3 are 
considered. For example suffix ‘am’ is present in second row, 
first column as given in Table 2.2; it is assigned the value 112] 
where description of each digit is as follows: 

l: a ending 

1: Masculine gender 

2: avithyA vibhakti 

1: ekvachan 
On similar guideline, five digit number schemes is designed for 
storing verb suffix [16]. In Sanskrit grammar, verbs are 
classified into ten groups called gan represented by most 
significant place in the number scheme. When a root word joins 
with the suffix (pratya), some changes takes place at the 
junction. With respect to these changes verbs are classified into 
nine different gan. Second digit from left denotes pad which 
occur in three different forms- Atmnepad, parsmaipad and 
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ubhaypad. Verbs whose outcome is for another person, they 
fall under 

Parasmaipad and verbs whose outcome is for one self come 
under Atmnepad. Verbal words whose outcome is for both, 
other person and one self, they come under ubhaypad. Time in 
which action takes place is given in various tenses. There are 
10 different tenses in Sanskrit [5]. Giving the context of the 
person is purush and number is vachan. A five digit number 
scheme for verb is presented in Fig 3. 


Figure 3: Five Digit Number Scheme for Verb Suffix 


Verbs in Sanskrit decline with respect to gan, pad, tense, 
person and number. These are influential parameters as they 
govern the behavior of the nouns in the sentence. Range and 
example values associated with each field are described in 
Table 4. 


so psp pa 


| Range | 
| oe al 


Table 4: Range and Example Values for Each Digit Position 
of Verb Number Scheme 









For example paThati (to read), has suffix ti which extracts the 
number 11011 from database there by giving the following 
information: 

1-bhavAdigan, 

1-parasmaipad,0-latlakAr(Present tense); 

1-pratham puruSh (Third Person); 

l-ekvachan (singular) . 
Pronouns decline in manner similar to noun and total number 
of pronoun is less, hence set of most commonly used pronoun 
are stored in a separate database with their key values[17]. 
Number scheme for pronouns is given in Fig 4. 


Vibha 


Figure 4: Four Digit Number Scheme for Pronoun 


_ P_number 


P_number identifies a particular pronoun like serva, sH etc. For 
example, 1 is given to tad, meaning ‘that’ in English. Rests of 
digit have same values as for noun. All the pronouns stated in 
rachanAnuvAdakaumudl are considered with nearly 400 entries 
in database [5]. Some words which do not change their form 
under any condition, they are termed as avyay. List of these 
words are maintained separately. 


4. ALGORITHM FOR CASE FRAME 

Objective is to generate the CF by identifying the semantic role 
(kAraka value) of eachyword with respect to action in a given 
sentence. Every word Within the sentence is searched in avyay 
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list. On unsuccessful search in the list, word is mapped in 
pronoun, verb suffix and then noun suffix database. This order 
has been followed as the quantity of words in each category 
follows an ascending order. 
After check in avyay list; word w,, is mapped in pronoun 
database; If found, its semantic role is identified by extracting 
the 4% digit from its key and performing vibhaktı kAraka 
mapping on it. Otherwise next look up is performed on verb 
database (VDB) followed by noun database (NDB). 
String is processed in reverse order from right to left. Last 
character of the word w, is identified (x) and all the suffix 
which has x as their last character is extracted from the VDB 
and stored it in a set. If any one of the value from this set 
matches as suffix in word; word w,, it 1s added to list of mapped 
suffix. If this list contains one element then a unique mapping 
has been found, else multiple matches are discovered. In case 
of multiple matches, splitter algorithm is activated to check for 
the category of word. Splitter breaks the word in base word and 
suffix. If base word is present in lexicon of verbal base, current 
word is tagged as action entity. If no match is found in verbal 
base then check for noun is performed. 
A similar mapping process is also performed for nouns, but this 
mapping process face a problem as occurrence of suffix in 
database is not unique; at time multiple matches are obtained. 
Jt is due to intergroup and intragroup redundancy of suffix. 
Occurrence of same suffix within one declension table is 
intragroup redundancy and occurrence of same suffixes across 
tables is intergroup redundancy. Depending upon frequency of 
occurrence and redundancies, suffixes are divided into three 
classes. Class I identifies unique occurrence of suffix, class H 
identifies intergroup redundancy and class II identifies 
intragroup redundancy. 
Class 1: Unique suffix 

Suffix with frequency of occurrence =1. 

Format of the data is 

(<key number>, <suffix>, <frequency of 


occurrence>) 
Example: (1111, °H’, 1) 
Class 2: Intragroup redundancy 


Suffix with frequency of occurrence greater 
than 1 and same AAraka value 
Example: (1131, en’, 2)(1331, ’en’, 2), 
suffix ‘en’ have same kAraka value 3. 
Class 3: Intergroup redundancy 
Suffix with frequency of occurrence greater 
than 1 and different kAraka value. 
Example: (1112, ’au’, 4) (1122, ’au’,4) 
(2171, au’, 4)(3171, ’au’, 4), 
suffix ‘au’ has AAraka values 1, 2, 7, 7. 
kAraka value is the 3™ digit in the key from left. Categorization 
of the suffixes is presented tn Fig 5 


All x-ending suffixes s, are extracted from NDB and stored in a 
set. If suffix belong to class I, it is easily given a AAraka role. If 
more than one suffix is present, then for every suffix s, word w, 
is split in base word and suffix using splitter algorithm. This 


539 


BIJTT - BVICAM’s International Journal of Information Technology 


base word is searched in lexicon of base words. If a match is 
found, then s, and its key number are stored as final suffix and 
final number in a list. Class H type Intergroup redundancy of 
the suffixes are handled by splitter routine. Ambiguity related 
to class III may still exist. To resolve such cases conflict 
resolution using neural „network is applied. Vibakti kAraka 
mapping is applied to 4" digit of this number and a semantic 
tag is assigned to the word. All the words with their semantic 


tag are stored in the case frame. 
Class I Multiple occurrence 
Freq of- Di Freq_of_occ{suff}>1 
Class II 
Different kAraka value 


Class II 
Same kAraka value 
Figure 5: Categorization of Suffix with Respect to their 
kAraka Value 
Out of the total 252 suffixes for noun, number of suffix 
belonging to each class is given in Table 5. 


Ce ae 
Table 5: Class wise Quantification of = Data 

Algorithm for hybrid model 

viist = lexicon of verb bases 

nlist= lexicon on nominal bases 

word under process 

suffix 

x= last character of word under 

process. 

set of all suffix ending with x 

set of all suffix in verb_xmatch which map as 























lili. 





verb_xmatch: 
verb_ suffix: 
suffix in word w, 
noun xmatch: set of all suffix ending with x 

noun_suffix: set of all suffix in noun_xmatch which map as 
suffix in word w, 

frame: is structure for storing elements like action, agent, object 
etc. 


Pseducode 





Input sentence S. 

for each word w, €S 

match w, in pronoun database 
if match found then 
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w,,cat=pronoun 
get key number from database of w, 
vibhakti kAraka mapping 
Frame.<case>= w, 
Break 
else 
last char of w, = x 
verb_xmatch = s, from VDB such that 
last_char(s,) = x 
verb_suffix= suffix in verb_xmatch that 
appear in word w, as suffix 
if verb_suffix |= NULL then 
for each s, € verb_ suffix do 
(base, s,>splitter(w, $,) 
If base € vlist then frame.action = w, 
end for 


endif 
if match not found in VDB, then check in NDB 
noun xmatch = all suffixes s, from NDB such 
that last_char(s,}=x 
noun_suffix = all suffix in noun_xmatch that 
appear in word w, as suffix 
num_set = respective number of matched 
suffix 
if noun_suffix |= NULL then 
for each s, € noun_suffix and mum, € num_set 
(base, suffix}-splitter(w, s, mum, ) 
if base € nlist then 
Identify the vibhakti of the word 
If class =1 then one value of vibhakti is obtained else 
If class=2 then one value of vibhakti is obtained etse 
If class=3 then more than one value of vibhakzi is 
obtained 
Call for conflict resolution using NN 
Perform vibhakti-kAraka mapping 
Store the AAraka value as semantic role of the word 
frame.<case> = w, 
end for 
endif 
Return frame 





Results of the algorithm is discussed in last section, here 
conflict cases under class I] are resolved using neural network. 
, discussed in next section. 


5. CONFLICT RESOLUTION USING NN 

For nouns in Sanskrit, intragroup redundancy of type III can be 
resolved using either statistical methods or NN based method. 
Statistical techniques require large corpus of data, due to lack of 
large size data with good vocabulary coverage, NN is 
implemented for conflict resolution. Back propagation algorithm 
with three layers is used to train the system for conflict cases. A 
set of pre annotated text is prepared which contain the suffix, 
category and vibhakti for each word of sentence. Sample of the 
annotated text is presented in Fig 6. 
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tveam/am.p.1 bhojanam/am.n.2 pachasi/si.v.x.pach| 
devH/H.n.1 vanam/am.n.2 gachchhati/t.v.x.gachch/ 
hariH/H.n.1 putrAya/Aya.n.4 bhojanahm/am.n2 


pachati/ti.v.x.pact 

rathavAhakH/H.n.1 aShvebhyH/ebhyH.n.4 ghAsam/am.n.2 
anayati/i.v.x.anayal 

idam/am.p.1 chAtrasya/asya.n.6 pustakam/am.n.2 asti/ti.v.x.as} 
devH/H shakten/en.n.7 gRAmam/am.n2 





Figure 6: Sample annotated data 
Most common ambiguous cases for vibhakti or kAraka value 
fall under four main domain (1,2),(1,2,7),G,4,5) and(4,5). NN 
takes features of corpus as input and final vibhakti or kArka 
value as output. Features selected for training network is given 
inTable6 ’ 


Feature typ 
Syntactic feature 
Ve 


_—— 
suffix 

Table 6: Feature selected for NN training 
A NN based system takes the input in numerical form; hence 
the word features are converted into suitable numerical value. 
Mapping of features into numerical values is shown in Table 7. 
Input coding algorithm reads the pre annotated text and 
generates the data for training the neural network. 
BPN is feed forward multi layer network consisting of mainly 
three layers. Algorithm uses two passes - forward and 
backward pass. In the forward pass, inputs are multiplied by 
respective weight and a bias added to it. Weighted sum of 
input along with bias is fed as input to hidden layer. Hidden 
layer uses a squashing function to limit the output value in 
desired range. Output from hidden layer is multiplied by 
respective weight and fed to output layer. Sigmoid function is 
used to limit the range of output. Output obtained is compared 
with actual value and error is calculated as difference of the 
two. Error is a measure of difference between actual and the 
desired output. This calculated error is propagated back in the 
backward pass. To improve the performance of the network, 
weights are modified as a function of propagated error. In the 
forward pass, weights of the directed links remain unchanged at 
each processing unit of the hidden layer. For n input values 
weighted sum is obtained and sigmod function is applied to this 
weighted sum. Time taken to train the network is directly 
proportional to size of data. If number of neurons is increased, 
training time increases. Classification of pAninI kAraka with 
NN require large size corpus for training. Hybrid model 
overcomes the problem of large training time by classifying the 
word with their vibhakti value in non-conflicting situations and 
applying NN under conflicting situations only. This requires a 
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small set of data and network is trained for conflicting classes 
only, thereby reducing the time. All the cases under same 
conflicting domain require same network. Major conflicting 
domains are (1, 2) (3, 4, 5) (6, 7).As data set for each 
conflicting case focuses on limited set of suffix, small data size 
is sufficient. For example am ending suffix fall in two class 1 
and 2; NN was trained on these cases and result of the cases is 


presented next section. 





Table 7: Sample set of encoded values 


6. CONFLICT RESOLUTION USING NN 

For nouns in Sanskrit, intragroup redundancy of type III can be 
resolved using either statistical methods or NN based method. 
Statistical techniques require large corpus of data, due to lack of 
large size data with good vocabulary coverage, NN is 
implemented for conflict resolution. Back propagation algorithm 
with three layers is used to train the system for conflict cases. A 
set of pre annotated text is prepared which contain the suffix, 
category and vibhakti for each word of sentence. Sample of the 
annotated text is presented in Fig 7. m 


tvam/am.p.1 bhojanam/am.n.2 pachasi/si.v.x.pach| 

devH/H.n.1 vanam/am.n.2 gachchhati/ti.v.x.gachch| ` 
hariH/H.n. | putrAya/Aya.n.4 bhojanahm/am.n.2 pachati/ti.v.x.paci 
rathavAhakH/H.n.! aShvebhyH/ebhyH.n.4 ghAsam/am.n2 


anayati/ti.v.x,anaya| 

idam/am.p.1 chAtrasya/asya n.6 pustakam/am.n.2 asti/ti.v.x.as| 
devH/H shakten/en.n.7 gRAmam/am.n.2 gachchhati/ti.v.x gachch| 
sH/H.p.1 mitre/e.n.7 vishvAsam/am.n.2 





Figure 7: Sample annotated data 


Most common ambiguous cases for vibhakti or kAraka value 
fall under four main domain (1,2),(1,2,7),(3,4,5) and(4,5). NN 
takes features of corpus as input and final vibhakti or kArka 
value as output. Features selected for training network is given 
in Table 8 

A NN based system takes the input in numerical form; hence 
the word features are converted into suitable numerical value. 
Mapping of features into numerical values is shown in Table 9. 
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Parameter 








Table 9: Sample set of encoded values 


Input coding algorithm reads the pre annotated text and 
generates the data for training the neural network. 

BPN is feed forward multi layer network consisting of mainly 
three layers. Algorithm uses two passes - forward and 
backward pass. In the forward pass, inputs are multiplied by 
respective weight and a bias added to it. Weighted sum of 
input along with bias is fed as input to hidden layer. Hidden 
layer uses a squashing function to limit the output value in 
desired range. Output from hidden layer is multiplied by 
respective weight and fed to output layer. Sigmoid function is 
used to limit the range of output. Output obtained is compared 
with actual value and error is calculated as difference of the 
two. Error is a measure of difference between actual and the 
desired output. This calculated error is propagated back in the 
backward pass. To improve the performance of the network, 
weights are modified as a function of propagated error. In the 
forward pass, weights of the directed links remain unchanged at 
each processing unit of the hidden layer. For n input values 
weighted sum is obtained and sigmod function is applied to this 
weighted sum. 

Time taken to train the network is directly proportional to size 
of data. If number of neurons is increased, training time 
increases. Classification of pAnin/i kAraka with NN require 
large size corpus for training. Hybrid model overcomes the 
problem of large training time by classifying the word with 
their vibhakti value in non-conflicting situations and applying 
NN under conflicting situations only. This requires a small set 
of data and network is trained for conflicting classes only, 
thereby reducing the time. Ail the cases under same conflicting 
domain require same network. Major conflicting domains are 
(1, 2) G, 4, 5) (6, 7).As data set for each conflicting case 
focuses on limited set of suffix, small data size is sufficient. For 
example am ending suffix fall in two class 1 and 2; NN was 
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trained on these cases and result of the cases is presented next 
section. 


7. RESULT AND DISCUSSION 
Training time for each conflicting domain is reported in Table 
10. 





Table 10: Performance of NN for Various Data Set 
Training Size=50 Test Data Size=10 
Fifty sentences used in training phase and ten sentences in 
testing phase. As depicted in the Table 6.1; 90% accuracy is 
achieved in am and ebhyH domain. Training of network for am 
abhyam conflicting case is given in Fig 8 and Fig 9. 


Pectermenes be $ Bate OC Douni hs 1 QOS 


10" 
y 10° 
"5 100 x6 rs KD 
; O Epes 
Figure 8: Training graph for am data set 
Performance w $ 866716008 Goal a 44-006 
10° 


ô 190 200 Wô 400 £00 m yoo 
778 Rpoots 


Figure 9; Training data for abkyam data set 
After training the network for conflicting cases; algorithm is 
tested on 100 sentences and accuracy of the output obtained is 
calculated by finding the F-score as given in Eq 6.1 


F_score = 2(p X r)/(p +r) —————- (6.1) 

It uses precision (p) and recall (r) to compute the score. 
Precision (p) = Number of correct result / Number of returned 
result 
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Recall (r) = Number of correct result / Number of results that 
should have been returned. 

F_score is understood as weighted average of precision and 
recall and is calculated as given in Eq (6.1). F-Score of each 
class under NN is given in Table 11. 

f= frequency of occurrence 

n= number present in the data 

onumber correctly identified 


ae 
me 


EE e 
a [ae os [ne oo 
e aroo i [7 


Table 11: Result for Hybrid Classification 










aa 
s Ce 
210 [8 


Hybrid model is better approach for semantic classification as 
compared to pure rule based system or NN based system. 
Performance of NN is dependent on the size of annotated corpus 
available for training with good coverage of the vocabulary and 
suffix. As it is suffix driven analysis, annotated corpus must 
include the suffix attached to the words. Due to lack of 
availability of corpus, with good coverage, results of pure NN 
based system lags behind hybrid system. Hybrid model exploits 
the potential of rules of the grammar and handles conflicting 
situation by implementing NN model. Requirement of large 
size corpus is reduced as corpus is designed for conflicting 
cases only. Pure rule base system require in depth knowledge 
and understanding of complex set of recursive and meta rules 
for transformation and exceptional cases. A person with good 
computational skill along with complete pAninian knowledge is 
difficult to achieve. 
Case frames for 100 sentences are generated by three 
algorithms and accuracy is checked at word level and sentence 
level. Word level accuracy is correctness of semantic tag 
assigned to each word and sentence level accuracy is 
correctness of generated case frame. Word level accuracy is 
discussed in Table 53, 5.6 and 5.8. For sentence level 
accuracy, accuracy of case frame is calculated. Accuracy of 
case frame depends upon two parameters:- 
e Number of significant words from a sentence appearing in 
case frame —{x) 
* number of words tagged correctly ——{y) 


le ka 
100% 
CNE 
AIT DRT FAR OE TOENE 


K L a 


For hybrid model, sentence level accuracy is calculated and 
presented in Table 12. 


Table 12.: Case Frame Accuracy for 100 Sentences under 
Hybrid Model 
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Out of 72 CF with all significant word of sentences; 56 are 
correctly tagged giving the accuracy of 77 % which is so far the 
best as none of the NLP processor for Sanskrit language has 
worked on KR tool generation for Sanskrit language. 

Use of NN in NLP is less frequent due to complexity prevailing 
in both domains [1]. Sanskrit language has rich inflectional 
morphological structure suitable for computational processing. 
Tabular declension of words with  syntactic-semantic 
significant suffix occupying predefined cell position drives the 
path for well structure knowledge representation mechanism. 
Identifying the semantic class of the word with suffix driven 
analysis under pAnin/ concept was the main theme of the work. 
Use of NN for resolving conflicting kAraka role under pAnin/ 
framework appears to be a better mechanism for semantic 
labeling of words. Initial identification is a baseline model 
upon which further extensions can be developed. Enhanced 
corpus with better coverage can further improve the results. 
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Abstract - This is a case of effective and efficient e- 
Governance where the licenses are issued electronically by 
using of IT for web based delivery service in the Directorate 
General of Foreign Trade (DGFT). Now a days complete 
licensing procedure is dealing through electronically ie. 
Exporter/Importer apply electronically to get the Importer 
Exporter Code( IEC) , submitting the fees as well as 
necessary documents electronically and recetved the IEC 
electronically. Trading Community is also availing the Online 
facility to submit the application for any licensing scheme, 
depositing the licensing fees, enclosing the required 
document from their end. The official procedure is also 
automated like initiating the note sheet, generating the ecom 
number, consolation of license fees and issuing the license to 
the exporter. There is fifty percent role of customs involved in 
trading. So that Electronic Data Interchange(EDI) facility is 
also established with Customs. In addition to above services, 
Bank Realization Certificate(BRC) is also integrated with 
this system. Henceforth an Exporter/Importer is equipped 
with electronic services without visiting to the office of 
DGFT, Customs and Bank. The web has been strategically 
leveraged for reengineering and transformation of trade 
processes for an economic trade facilitation. 


Index Terms - E-Licensing, DGFT, E-Governance 


1. INTRODUCTION 

NIC-DGFT (Commerce and Industry Informatics Division) 
playing a significant role in architecting & implementing e- 
Governance initiatives with the best possible technology 
support in the Directorate General of Foreign Trade(DGFT). 
DGFT is a country wide organization and responsible to 
increase the export of the country has been discussed in [12]. 
Appropriate backbone ICT infrastructure has been established 
in DGFT which includes OFC-based Internet connectivity with 
Gigabit-based Local Area Network (LAN), Video 
Conferencing, IT equipped help desk, etc. supported by a team 
of highly qualified IT professional. [1-11] are the various e- 
Governance models defining the vanous e-Governance 
indicators and parameters which have been implemented in 
different forms. ‘Trade Facilitation’ is a key determinant of a 
country’s competitiveness in the international market so there 
was a thrust of traders to familiar with it . Over the years, 
Government of India has taken various initiatives to simplify 
and rationalize procedural complexities in exports in order to 
put in place an efficient and effective trade facilitation 
mechanism and reduce the implicit transaction costs associated 
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with the enforcement of legislation, regulation and 
administration of trade policies involving several agencies such 
as Customs, Airport and Port Authorities, bank, trade ministry 
etc. The transaction cost has been evaluated at about 8 to 10% 
of the value of exports and any mitigation in this has a 
permanent benefit accruing to the exporters. 

NIC-DGFT has played catalytic & significant role in 
implementing e-Governance project in the DGFT with an aim 
to leverage IT for transparency and better governance. ,Keeping 
in view the object Directorate General of Foreign Trade had set 
up an online trade facilitation system. It containing EDI 
interfaces with the Trade Partners and all concerned in the 
value chain have been established. Customs, Banks, Trade and 
Industry and other Government Agencies are the part of this 
mechanism. Electronic Data Interchange (EDD is core driver 
for facilitating international trade and one of the key initiatives 
is electronic transmission of foreign exchange realization 
details on exports by banks on a daily basis under the 
Electronic Bank realization Certificate ( e-BRC) initiative. 
Exporter will not be required to make any request to Bank for 
issuance of Bank export and Realization Certificate ( BRC). 
This will establish a seamless EDI connectivity amongst 
DGFT, Banks and Exporters. This is significant step to reduce 
transaction cost to the exporters.. 


2. OBJECTIVES OF DGFT 
The major objectives of DGFT are as follows: 
2.1. Effective and efficient e-governance services, 


2.2. Globally accessibility of the e-services. 

2.3. Maintaining the integrity of public services. 

2.4. Reduction in transaction cost and time. 

2.5. Elimination of fraud practice of trade and industry 

2.6. Physical visit of exporter of the office reduced to 
minimum. 

2.7. Publishing of Monthly license Bulletin. 

2.8. Implementation of single common document for the 
trade. 

2.9. To move the DGFT in paperless environment. 


3. ACIEVMENTS 
To achieve the above said goals DGF organization requires 
intensive use of ICT infrastructure. The online service has 
become a core implementation strategy for delivery of an 
efficient, transparent and easy to access service. For the 
implementation of powerful and successful e-Governance 
complete setup has been renovated in the following manner as: 
3.1. DGFT has been automated in all respect 
3.2. DGFT web site prepared and hosting annually 
including latest policies, procedures, Circulars, 
Notifications and public Notice etc 
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3.3. Launching the web based application to get the 
licenses for the trading community 

3.4. Creation of Central Data ware house of license data. 

3.5. Global trade facility is available round the clock 
through the DGFT Portal http://dgft.gov.in 

3.6. EDI with Customs is operational. 

3.7. Net banking facility is made available to pay the 
licensing fees. MOU between DGFT and 43 banks has 
been signed. 

3.8. Serving the trading community of 5.5. lakh exporters 
and importers using online facility 365x24x7. 

3.9. Professionally managed help desk id operational at 
DGFT HQ as well at regional office. 

3.10. All 36 DGFT port offices are providing the trading 


facilities country wide. 


4. MAJOR PARTNERS OF TRADE 
4.1 Customs; M Exchange pertaining to various 
FTP schemes dike Advance Authorizations (AA), Duty 
Exemption Passbook (DEPB), Export Promotion 
Capital Goods (EPCG) ete. 
Banks; Message Exchange to obtain Foreign 
Exchange realization against exports (under 
implementation) 
Export Promotion Councils (EPCs); Message 
Exchange / uploading of membership details of 
_ registered exporters (e-RCMC) 


42 


43 


5. KEY TECHNICAL 
ONLINE SERVICES 

All processes and procedures have, therefore, been 
reengineered leveraging the web technology. Capability, 
flexibility and management of DGFT’s website is vital to the 
process of trade facilitation. 
The four major key attributes of the DGFT’s website are: 

5.1 A broad application Filing Spectrum 

5.2 Security Features 

5.3 Web Management 
5.4 Technology 
The above key attributes of the DGFT’s website are indicated 
in the following schematic (Figure 1). 


ATTRIBUTES OF DGFTS 


6. CITIZEN CENTRIC APPROACH 

6.1 Web based operational environment is made available 
for Trade policy and procedure implementation 
globally, on 24x7x365 basis for all citizen. 
DGFT Head Quarter with all 36 regional offices 
spread (but virtually being one) across the country for 
providing the online trading facility at user end. 
o-Licensing facility for almost schemes like Advance 
Authorization (AA), Duty Entitlement Passbook 
(DEPB), Export Promotion Capital Goods (EPCG), 
Focus Product Scheme{FPS), Focus Market 
Scheme{FMS), Vishesh Krishi and Gram Udyog 
Yojna (VKGUY) Scheme, Status Holder Incentive 
Serip(SHIS) Scheme, Market Linked Focused Product 


6.2 


6.3 
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( MLFPS) Scheme, Served from India Scheme (SFIS) 
and Agri Infrastructure Incentive Scrip (AIIS) Scheme 


etc. 

On the DGFT web site http-//dgft.gov.in a facility has 
been provided to search/enquire about the current 
Import Policy of an item by entering either ITC ( HS) 
Code of that item or brief description of that items. 
This would be of major help to trade and industry as 
well as to academicians and researchers. 

Organization has undertaken a through revision of 
Foreign Trade Policy/ Handbook of procedures 
electronically to make it more user friendly. 
Substantial efforts have been made to remove 
ambiguities in language, delete repetitions and 
harmonize the text with amendments to policy and 
new policy announcements. 

An extremely challenging and significant EDI 
initiative ¢-BRC has been launched by DGFT It 
would herald electronic transmission of Foreign 
Exchange Realization from the respective Banks to the 
DGFT,s server on a daily basis. In addition to this EDI 
linkages with Trade and Industry, Government. 
Agencies and related EDI community partners i.e., 
Customs, and EPC’s etc. e-BRC would facilitate early 
settlement and release of FTP incentives/entitlements 


for the exporters/importers. 


7. TRANSITIONAL COMPATIBILITIES 

7.1 The ‘on-line’ filing facility is user friendly, data input 
through structured screens, access controlled by 
DSC’s, inbuilt facility to edit and validate before 
submitting data and availability of FAQ’s to assist 
filing. 
Status of Authorization and Importer Exporter Code 
(IEC) 
Electronic Fee Transfer (EFT) 
Secure and automated EDI based environment with 
‘on-line’ EDI Message Exchange with community 
partners 
Covers all models of e-governance i.e. B2G, G2G, 
G2B, G2C and C2G. 


6.4 


6.5 


6.6 


7.2 


Tes 
7.4 


7.5 


8. SEARCH ENHANCEMENT 

A comprehensive user friendly search facility is available 
on the web portal for the people to search any trade related 
information. Any Exporter/Importer may know the status 
of any Authorization as well as IEC at any time from any 
where. All Trade related documents may be obtained 
through the menu. Latest updates of Foreign Trade Policy 
and Procedure, RTI Related Information that who is who ? 
Citizen charter etc may be noted down from the DGFT 
site. All type of format may also be downloaded as and 
when required. 
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9. WEB SECURITY FEATURES 

9.1 The user authorization is through digital signatures. 
However option to login through a user name and 
password also exists to provide flexibility 
The Digital Signature includes embedded IEC details 
also which when registered on DGFT’s website get 
validated and ensure high level of security. DGFT has 
also recently migrated to a 2048 bit encryption for 
higher level of security. DGFT is geared to handle any 
changes which may be required after implementation 
of interoperability in issuance of Digital Signature 
Certificates (DSCs) 
Database and application server maintained under 
firewall 
A three tier architecture used for the application 
Physical security is ensured by NIC Data center 
authorities 


92 
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9.4 
9.5 


10. MESSAGE EXCHANGE BEHAVIOR WITH 
VARIOUS COMMUNITY PARTNERS) 

The message exchange behavior with various community 

partners may be shown as given in the table 1: 


Network Partner | Projects/ Network Security 
pene Topology 


Athenee: 
Shipping bill 





el el 


11.5. Whole DGFT Organization is connected with internet 
/ intranet / VPN through very high speed connectivity 
with NICNET infrastructure. 
The continuous technology up gradation has not only 
prevented obsolescence but has kept our infrastructure robust 


from security, capability, flexibility and compatibility 
perspective 


12. EDVONLINE FILLING 
SYSTEM 
12.1. EDI Help Desk is manned by expert professionals 
12.2. managed by the EDI Division to resolve EDI related 
complaints. Nodal officers have been nominated at 
DGFT / major RA’s to monitor / resolve EDI related 
complaints from trade and industry. 
A tracking system has been established on the basis of 
a unique complaint number. An online complaint 
registration system is implemented. 


ERROR RESOLUTION 


12.3. 


13. OTHER IMPORTANT INFORMATION LINKS 
13.1. Right to Information Act (RTT) 
13.2. Citizen Charter 


Message Exchange file 
format 


Access control | Flat file through FTP 
through DSC l 


— | E-BRC One to many Offline control | XML file upload 
nee DSC 

anneal E-RCMC One to many Offline control | XML file upload 
aie DSC 


lll a nell 


11. TECHNOLOGY ADOPTED 
For smooth functioning of e-Governance project we have 
adopted the object oriented language ( java) and 
supporting the database (DB2) at back end. The details of 
the technology tools are as follows: 

11.1. J2EE technology (Applet, Servlet, Enterprise Java 
Beans, JSP, ee IBM DBZ as database with 
digital signature 
J erage ai 2007, “J2SDK/J2SEE tools for applications 
development. 

IBM Web Sphere, Macro Media JRun Web Server for 
application servers. 

Rational Suite is implemented for documenting/ 
designing/development of the application. 

The website is being updated using the Extended 
Markup Language (XML) technology. 


112. 


7113. 
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F 


Integration with bank 


website 


line contro! 
acca, DSC 


13.3. DGFT’s Regional Offices, Ministry of Commerce & 
Industry, Directorate General of Commercial 
Intelligence & Statistic ( DGCI&S), Central Board of 
Excise & Customs (CBEC), Special Economic Zone 
(SEZ), World Trade Organization (WTO), Customs 
Port Location Code 

13.4. Public Grievances 


14. ECONOMIC OUTCOMES 
Due to providing the SMART services by e-Governance to 
Trading Community the money is saving by each stake holders. 
Public as well as Government is benefited with this application. 
The major factors as below: 
14.1. Application for licensing can be filed from any where 
14.2. Status of the application can be tracked just click a 
button 
14.3. Physical visit of exporters of the office has been 
reduced to minimum 


{h 
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14.4. 
average Rs. 50,000/per interview 
Cost of stationery brought down up to 80% 
Reduction in paper work due to paperless operation in 
DGFT 

14.7. Reduction in transaction cost 

14.8. File preparation cost come down to 0 
15. TIME FACTOR OUTCOME 

15.1. Licensing application preparation time come down 
from 5 hours to 5 minutes 
Application processing time has also come down from 
45 days to 5 hours. 
Message exchange time for license verification has 
come down from 6 month to instant 
Status of application can be traces on urgent basis. 
Complete process is very fast 
Collection of license fees as well as consolation is too 
fast due to EFT implementation . 
Reduction in transaction time 


14.5. 
14.6. 


15.2. 
13.3; 


15,4. 
15.5. 
15.6. 


15.7: 


16. GENERAL OUTCOME 
16.1. Fraud practice of trade and industry in eliminated 
16.2, Entire process is transparent 
16.3. Collection of license fees is easy and systematic 
16.4. G2G, G2B, G2C, C2G, B2G model of e-Governance 
16.5. Secured automated EDI based environment, 
16.6. Implementation of single common documents 


17. E- LICENSING 
TECHNOLOGY 

In India, over the last two decades, Information and 
Communication Technology(ICT) has emerged an effective 
tool to deliver services to the people. Expansion of 
Telecommunication Infrastructure and penetration on Internet 
in large parts of county, has enabled the government to provide 
effective, efficient and multichannel delivery of government 
services to the citizens, Initially the emphasis of e-governance 
initiated towards G2 G services relating to automation and 
computerization of inter functioning of the government since 
last few years focus on e-governance has shifted to electronic 
delivery of services to the citizens at his end. So that as the 
interest in new and expanded e-governance increases public 
managers find themselves making decisions about information 
and information technology for which they are often 
unprepared or ill-equipped. Identification of the complexity and 
risk of IT decisions public managers involved in making these 
types of decisions has spurred the development of many 
structured tools and rigorous to support IT business case 
analysis and risk assessment strategies recommended in some 
government agencies and required in other also as referred in 
[11]. 

A gap analysis between a selected set of practitioner tools and a 
set of key success factors of IT initiatives has the potential to 
inform questions about the relationship between research and 
practical. A gap analysis strategy represents an opportunity to 
do a component-by-component analysis to determine the extent 
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Interview through Video Conference saves on an to which the decision of each reflects awareness of relevant 


research on information system success. The gap analysis is- 
comprised of the steps as follows: 

17.1. A review of current literature in information system 
research is used to identify factors found to influence 
the success of IT initiatives. 

The research identified and described a set of tools 
used for government IT initiatives. These tools to be 
selected based on their visibility and central role in 
informing practitioners at the National Level. 

A comparison of the factors against the selective 
descriptions was conducted 

An identification of the gap between the research and 
the practical tools is presented and discussed. 


17.2. 


L3: 


17.4. 


18. EMERGING CHALLENGES FOR E-LICENSING 
Although providing numerous opportunities for better 
governance globalization and ICT have also brought in many 
new challenges for e-Licensing like infonpatfon and data, 
information technology, organizationa] and managerial, legal 
and regulatory, institutional and environmental factor etc. The 
major challenges may be classified in a following manner: 

18.1. Information and data challenges:  e-Licensing 
initiatives are about the capture, management, use, 
dissemination, and sharing of information. A number 
of challenges relate to the information that is at the 
core of e-Licensing initiatives. Information and data 
quality, security issues, Technological incompatibility, 
Technology complexity, Technical skills and 
experience, technology newness, project size, 
Management attributes and behavior, organizational 
diversity, lack of alignment of orgapizational goals 
and project multiple or conflicting goals, restrictive 
laws and regulations, intergovernmental relationship, 
budget and political pressure, autonomy of agencies 
etc. are the major challenges to implement the e- 
licensing application. 

Information Technology: Technology 
incompatibility has also been identified as one 
difficult challenge to eLicensing project. Very 
different and old systems increase complexity of IT 
projects, complexity and newness of technology are 
also constraints to effect the result of IT projects. The 
lack of relevant technical skills as well as the shortage 
of qualified technical personnel within the project 
team has been found to be an important challenging 
factor. 

Organizational and managerial: The size of the 
project and the diversity of the users and organizations 
involved are two of the main challenges of e- 
Licensing project. There are lack of alignment 
between organizational goals and the existing project, 
secondly individual interests and associated behaviors 
lead to resistance to change internal conflicts. 

Legal and regulatory: Like most of government 
department DGFT is also created and operate by virtue 


18.2. 


18.3. 


18.4. 
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of a specific formal! rule or group of rules. In making 
any kind of decision, including those in this project, 
public managers take into account a large number of 
restrictive laws and regulations. 


19. RECOMMENDED RESEARCH METHODOLOGY 
TO OVERCOME THE CHALLENGES 

To achieve success in e-Licensing as e-governance initiative a 

set of strategies may be drawn by mapping the challenged 

categories. This illustrates the degree of correspondence in 

research itself between challenges and possible strategies for 

meeting those challenges as: 

19.1. Information and data strategies: Information and 
data challenges require a overall plan for managing 
data and information produces. A quality and 
compliance assurance program is an effective strategy 
for dealing with information and data challenges 
managers have attempted to minimize data related 
problems by sharing standards, definitions and meta 
data with their potential partners like customs, banks, 
export promotion councils etc. In spite of this 
continual feedback from partners and users should 
maintain. 

Information Technology Strategies: IT related 
issues i.e. ease of use, usefulness, demonstrations and 
prototypes etc. are success strategy. Well skilled and 
respected IT leader, expert project team, clear and 
realistic goal, identification of relevant stakeholders 
and user involvement proper planning, good 
communication, clear milestones and measurable 
deliverables adequate funding, best practice review, IT 
policies and standards etc. are the key success 
strategies. 
Organizational and Managerial Strategies: For the 
successful IT initiatives there is a clear realistic goals 
is an important factor. Relevant stake holders and 
getting their involvement in the project development 
process, specially end users has also been found to be 
s an effective strategy in overcoming the organizational 
and managerial challenges. Strategic planning 
technique can be seen as an umbrella for more specific 
strategies such as milestone and measurable 
deliverables, good communication channels. It is also 
extremely important to take care of developers and 
end users current skills and training needs. Successful 
projects need a balanced combination of technical 
managerial skills and expertise among their members. 
Legal and regulatory Strategies: Restrictive laws 
and regulations developed prior to or in ignorance of 
technologies relevant to e-Licensing can affect the 
success of project. Our strategy for responding to 
these challenges is to invest in changes to the 
regulatory environment that allow for or enable 
adoption of emerging technologies. As Digital 
Signature Technologies for example required statutory 
changes in most jurisdictions before they would be 


19.2. 


19.3. 


19.4. 
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adopted for use. Developing appropriate government 
wide IT Policies and standards can also provide and 
adequate framework for e-government initiatives to be 
successful. 


20. FUTURISTIC RESEARCH TOOL 

e- Licensing is a key challenge for government today as they 
involve multiple stake holders and multiple processes and 
demand considerable co-ordination and collaboration as well 
as managerial and financial resources we may adopt the 
following strategies as: 

20.1. Promoting advance ICT training, education and 
research as and when conception of new technologies. 
Negotiating and influencing the proper adoption of 
international frameworks, norms and standards by 
participating actively in the governance of the global 
information economy. 

Documenting best success and worst failure benefiting 

knowledge 

Promoting innovation and risk taking through fiscal 

concessions and availability of venture capital, 

creating an investment climate for domestic and 

foreign investment in ICT sector 

Developing a supportive framework for early adoption 

of ICT and creating a regulatory framework for ICT- 

related activities, eg. fixed and mobile 

communication, e-commerce and internet services. 

Application of Online Performance Tracing System 

Implementation of online Audit System. 

Integration of Realty simple syndicatio (RSS) system 

with existing system for wider level simplification. 

Inclusion of Cloud computing concept as futuristic 

approach. 

20.10. Adoption of Yi Fi communication in the entire 
organization. 

20.11. User requirement analysis is a major tool for 
refinement of the project 

20.12. Use feedback analysis is also a powerful key facto: for 
improvement of project. 

20.13. Cost analysis is always a considerable measure for the 
project. 


20.2. 


20.3. 


20.4. 


20.5. 


20.6. 
20.7. 
20.8. 


20.9. 


21. CONCLUSION 

In this paper, I have presented the effects of e-Governance 
indicators in Directorate General of Foreign Trade, Ministry of 
Commerce and Industnes, Govt. of India that the trading 
community availing the maximum facilities in minimum time 
from their end only within transparent environment. The 
Government of India, department of Electronics and 
Information Technology, has initiated national e-governance 
plan for the execution of e-governance projects in the country. 
In the same pattern we have applied the e-Governance module 
in DGFT to move in a paperless Journey. The various 
outcomes are indicated to support the effective and successful 
e-Governance. 


549 


BUIT - BVICAM’s International Journal of Information Technology 


This is the case study of best e-governance project. This project 
is highlighted in various e-governance seminar /workshop. 
This is the first govt. project in which ICT was implemented 
with digital signature and electronic fund transfer facility in 
1998. Now a days this office is operational in paperless 
environment. 
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Abstract - It presents a rectangular microstrip patch antenna 
integrated with combination of pentagonal and hexagonal 
shaped structure etched at the height of 3.276 mm from the 
ground plane. It is demonstrated that the application of the 
media with a negative refractive index or metamaterial 
eliminates the spurious harmonics (these are those unwanted 
dips which shows in the S11 graph) associated with the 
original structure. Furthermore the return loss is improved by 
the inclusion of the metamaterial structure reaching -27.1919 
dB compared with -10.1286 dB achieved by the original patch 
antenna structure alone. Main focus in this design process is 
not to reduce the return loss but reduce the size of the 
antenna and this target has been achieved by reducing the 
size of antenna up to 65%. Numerical simulation results show 
that this proposed design possesses several desirable 
characteristics, for instance, high bandwidth, low loss and 
improved directivity compared to the alone RMPA. The CST- 
MWS software is used for designing and simulation, and MS- 
Excel for metamaterial proving. 


Index Terms - Media with negative refractive index 
(metamaterial), rectangular microstrip patch antenna 


(RMPA), permittivity, permeability, NRW approach, Return 
Loss. 


1. INTRODUCTION 

In last decade the peremptory of Wireless communications 
systems have grown drastically. To fulfil this requirement, 
multifunction antennas have been designed for multipurpose 


in communication technology and extensive growth in the 

į wireless communication market and user demands exhibits the 
meed for compact, reliable and efficient, wireless systems. 
‘Integrating whole transmitter and receiver system on a single 
chip [1], [3] is the imagination for future wireless systems. This 
particular idea has the benefit of cost reduction and enhancing 
system reliability. Antennas have always been considered as 
the largest components of integrated wireless systems, 
consequently antenna miniaturization became a necessary piece 
of work in achieving a favourable design for integrated wireless 
systems. Moreover, compactness is important aspect in wireless 
communication,, addition with the other parameters 
improvement like directivity, return loss, bandwidth [2]. These 
characteristics can be achieved by covering of microstrip patch 
antennas with metamaterial structures [4], [5]. 
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Several researchers have been trying from years to reduce the 
size of the antenna. It has been attempted in many ways and 
different concepts were proposed. Recently, metamaterial based 
structure, originally proposed by Pendry, has opened the door 
to new design strategies, where miniaturization and 
compatibility in planar circuit technology are key aspects. In 
21™ century split rings resonators (SRRs), originally proposed 
by Pendry [6], [7], have attracted a great interest for the design 
of negative permeability, negative permittivity and left-handed 
(LH) effective media [5]. ` 

In late sixties (1967) Victor Georgievich Veselago [5], a 
Russian physicist was the first researcher who presented the 
theory of metamaterial, which exhibit negative permittivity €, 
and permeability u [16] also known as media with a negative 
refractive index or left handed material [11], [13]. In such a 
material, he showed that the phase velocity would be anti- 
parallel to the direction of Poynting vector. This is contrary 
to wave propagationin natural occurring materials. In the 
beginning of 21” century, papers were published about the first 
demonstrations of an artificial material that produced 
anegative index of refraction (that was discussed in last 
paragraph). By 2007, research experiments which 
involved negative refractive index or metamaterial 
properties had been conducted by many groups. 


2. DESIGN METHODOLOGY 

All the design work and simulation work has been done on the 
computer simulation technology microwave studio (CST- 
MWS). And the proving of the metamaterial which used to 
enhance the property of RMPA, Microsoft excel software is 
used. Initially dimensions were calculated for the operating 
resonant frequency i.e. 2.05GHz by using formulas shown 
below.For calculation of width and length of the patch antenna: 








a t EE A A 
aa an +i tf d tee t (1) 
L = Leff — 2AL (2) 
Where, 
sab 
Leff oa s Fey (3) 
fs +a naea) 
AL Jeff 
= = 0.412 (ERECTA LAT) pare (4) 
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_ rth rei t (5) 
O aT Npa 





In above used formulas the symbols have their usual 
meanings. 

e.g. 

c = Velocity of light in free space, 
€, = Substrate’s Dielectric constant, 
Er ™ Effective dielectric constant, 
Lee = Effective length. 

After dimension calculation design work has been done. 
Perfect electric conductor was used to make the patch antenna 
over the ground which also having the same material with 
substrate between patch and ground. RMPA at 2.05GHz 
frequency is shown in figure 1. 





Figure 1: RMPA at height of 1.6mm from ground of 
2.05G Hz. 
The simulation result of the patch shown in figure 1 is in 
graphical form shown in figure 2, with the return loss and 
bandwidth of -10.1286dB and 7.7MHz respectively. 
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Figure 2: Simulation result of the RMPA with return loss of 
-10.12dB and bandwidth of 7.7MHz at 2,05GHr. 
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After the RMPA simulation the metamaterial cover is 


implemented over the patch antenna at the height of 3.2mm 
from the ground. The proposed metamaterial structure 
implemented as the cover of antenna with its dimension used in 


the proposed design is shown in figure 3. 





Figure 3: Proposed metamaterial structure at the height of 
3.2mm from ground. 

The simulation result after the implementation of the 
metamaterial! over the rectangular microstrip patch antenna at 
the height of 3.2mm from the ground enhance the property of 
the RMPA alone and reduces the size of the antenna by shifting 
the lowest dip to a frequency other than the operative frequency 
ie. at 0.651GHz The size is being reduced to 65%. The 
simulation result with the metamaterial is shown in figure 4. 
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Figure 4: This simulated result is showing the return loss of 
-27.19dB and bandwidth of 10.82MHz at 0.651GHz. 
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Comparison of dimensions between reduced patch antenna | 
using media with negative refractive index at operating - 
frequency 2.05GHz and RMPA alone at 0.651GHz is in tabular 
form below. 
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Dimensions Dimensions of 



















of RMPA RMPA using 
alone at metamaterial 
0.651GHz works at 0.651GHz 


Tea a ccs Dad 
aa | ise | ae 
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Cut 
length 
_ ~ 
Í ao 


length of 
Table 1: Comparison of Dimensions 





After comparing it is necessary to prove that the material here 
used to reduce the size of RMPA is Meta, NRW (Nicolson 
Ross Weir) approach [14] is used to prove it. The following 


formulas belong to NRW approach: 
_ _e(i—rI (6) 
gated (4-92) 
BSL of 
er = lp $ (7) 
Where, 


V- = S21 - S11 or Voltage Minima, 
œ = Frequency in Radian, 
d = Thickness of the Substrate, 

= Speed of Light, 
u= Relative permeability, 
£, = Relative permittivity. 
In NRW approach, proposed design of patch antenna having 
metamaterial structure placed between two waveguide ports on 
both sides of antenna on X-axis to calculate S11 and $21 
parameters. Y and Z planes are defined as the perfect electric 
and magnetic boundary respectively. Following that, the wave 
was excited toward the port 2 from port 1 or left to right. 
Later on, after the simulation in CST-MWS software the S11 
and S21 parameters were exported to MS Excel software for 
further calculation. In MS Excel equation number (6) & (7) 
were used for proving of structure that it is metamaterial. The 
result obtained using NRW approach are showing negative 
permeability and permittivity in figure 6 & 7 respectively. 
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Figure 5: Proposed metamaterial structure between 
waveguide ports. 
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Figare 6: Permeability versus frequency graph obtained 
from Excel software. 


Figure. 7: permittivity versus frequency graph obtained 
from Microsoft Excel software. 


The Table’s generated for permittivity and permeability by 
using MS-Excel Software was too large, therefore the Table 2 
& Table 3 shows the negative value of permittivity and 
permeability only in the frequency range 0.6419-0.6539GHz. 
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Permeability[pr] Re[pr] 
[GHz] 


-31.93 16370277838- - 
14.5648307462409i 31.93163703 
29.57592012 
27.30646544 
-25.1 190003 
23.00809378 


Tabie 2: Sampled Values of Permeability at 0.651GHz 
Calculated on MS Excel Software. 
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Figure 8: Hardware of 
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-37.1552405011718- -37.1552405 | Figure 3: PEE A are pee a return loss of - 
24.6492429073745i at 2. 


~35.314195613912- - 
25.23299948151071 35.31419561 


-33.6325777383075- - 
25.80218195873651 33.63257774 
~32.08995 16273428- - 
26.34339368409061 32.08995 163 
~30.665474813861- - 
26.8496952334226i 30.66547481 


Table 3: Sampled Values of Permittivity at 0.651GHz surface. 
Calculated on MS Excel Software. 
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Figure 10: Incorporated metamaterial structure over patch 





l 3. CONCLUSION AND FUTURE SCOPE 
After proving of metamaterial it has been defined that the By emphasizing RMPA with the Metamaterial structure the 
proposed structure to miniaturize the antenna was metamaterial. frequency on which it shows its maximum power output or 
Post proving, hardware of the proposed design was constructed lowest return loss is 0.651GHz Table | shows the comparison 
and analyzed using spectrum analyzer and the results of RMPA ee -a designed i ASe of ati and at 
alone and incorporated feeler were com Bi are 2. with metamaterial. at 0.651GHz consumes a 
pkai large area instead of RMPA at 2.05GHz. By using metamaterial 
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it became possible that the antenna at 2.05GHz operating 
frequency be able to work at 0.651GHz frequency with 65% 
Jess area and more accurate results [9][10]. Figure 2 & 4 shows 
the comparison of return loss & bandwidth of the RMPA alone 
and with the metamaterial. It has been found that the return loss 
is reduced by 17dB & the bandwidth is increased by 3MHz of 
the proposed structure. The Figure 6 & 7 shows the negative 
value of permittivity & permeability at the operating frequency 
of 0.651GHz. This proves that the proposed Design of media 
with a negative refractive index is a Metamaterial Structure. 
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Figure 11: Analyzed result after negative media 
incorporation showing return loss at 0.76 GHz. 


Authors presented a new design methodology in this letter for 
creating highly miniaturized patch antennas, by adding a single 
layer that contains a combination of hexagonal and pentagonal 
like structure at a height of 3.276 on RMPA. The size of the 
antenna can be reduced significantly without affecting 
bandwidth with little effort at low cost. The purpose of this 
work is to produce a small, low cost Antenna that can be used 
for L band (1-2GHz) applications. An even smaller antenna is 
possible by this proposed design, but with further 
miniaturisation comes lacking in radiation efficiency and 
bandwidth that may prove undesirable. 
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Abstract - In this paper we present a reversible image 
steganography technique based on Slantlet transform (SLT) 
and using advanced encryption standard (AES) method. The 
proposed method first encodes the message using two source 
codes, viz, Huffman codes and a self-synchronizing variable 
length code known as, T-code. Next, the encoded binary 
string is encrypted using an improved AES method. The 
encrypted data so obtained is embedded in the middle and 
high frequency sub-bands, obtained by applying 2Hevel of 
SLT to the cover-Image, using thresholding method. The 
proposed algorithm is compared with the existing techniques 
based on wavelet transform. The Experimental results show 
that the proposed algorithm can extract hidden message and 
recover the original cover image with low distortion. The 
proposed algorithm offers acceptable imperceptibility, 
security (twotayer security) and provides robustness against 
Gaussian and Salt-n-Pepper noise attack. 


Index Terms - Reversible Steganography, DWT, SLT, 
Thresholding scheme, PSNR, AES, Huffman codes, T-codes 


1, INTRODUCTION 

Data hiding or steganography is the art and science of hiding 
information into a carrier media (such as text, image, audio or 
video etc.) so that it conceal the existence of a hidden 
information and its detection becomes difficult. There are 
applications in which it is desirable to recover the original 
cover from the stego-image without any distortion after hidden 
data extraction. There are many papers on reversible 
steganography in literature [12, 15, 20-24]. The summary of 
such algorithms may be seen in the papers [2], [3]. 

The three basic requirements of steganography algorithm are 
Imperceptibilty, high embedding payload and security [9, 10, 
16], The organizations such as banking, commerce, diplomacy 
and medicine, private communications are essential. Security is 
an important issue in the information technology now-a-days. 
Modern cryptography provides a variety of mathematical tools 
for protecting privacy and security that extend far beyond the 
ancient art of encrypting messages. However, for carrying out 
confidential communication over public networks, simply 
concealing the contents of a message using cryptography is 
found to be inadequate as it can still raise suspicion to 
eavesdroppers. People have found the solution to this problem 


in Steganography. The image steganography techniques may be 
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classified into two categories: Reversible techniques in which 
receiver wish to retain the original message after extracting the 
hidden message from the stego-image and Irreversible 
techniques in which the objective of receiver is only in 
extracting the hidden message from the stego-image. In 
medical profession and Jaw enforcement fields, it is not only 
the hiding and recovery of message required perfectly but also 
the recovery of original image is important for the examination. 
The authors have used synonyms to Reversible technique as 
distortionless or lossless technique. Xuan et al. [20-23] have 
presented distortionless data hiding based integer wavelet 
transform. Celik et al. [3] have proposed a reversible data 
hiding method based on the idea of first compressing portion of 
the signal that are susceptible to embedding distortion and then 
transmitting it as part of embedded payload. Sushil Kumar and 
S.K. Muttoo[12] have proposed a distotionless steganographic 
algorithm based on slantlet transform and shown that it 
outperforms than the DWT in terms of PSNR. Panda and 
Meher [13] have shown that Slantlet Transform (SLT) offers 
superior compression performance compared to the 
conventional DCT and the DWT based approaches. Ni et al. 
[12] presented a reversible data hiding algorithm based on 
histogram shifting with a quite limited embedding payload Tian 
[17] proposed a high capacity reversible data hiding scheme by 
using a difference expansion. Xian-ting Zeng et al. [24] have 
proposed a lossless data hiding scheme by using dynamic 
reference pixel and multi-layer embedding. This scheme can 
offer very high embedding capacity and low image 
degradation. 

In this paper, we propose a reversible image steganographic 
method based on CTT. The proposed scheme can offer high 
imperceptibily than the existing scheme based on DWT and 
low image degradation. The use of T-code is a plus point as it 
provides self-synchronization at decoding stage and-a layer of 
security as receiver will need decoding key (generated at the 
time encoding) for extracting the original message at decoding 
stage. There is another layer of security added at embedding 
scheme by hiding the secret bit randomly, i.e., using random 
permutation of sub-bands coefficients. Advanced encryption 
standard (AES) used in the scheme is one of the most powerful 
techniques of cryptography which can be used as an integral 
part of steganographic system for better confidentiality and 
security. Dilbagh singh et al. [4] has proposed private key 
encrypticen technique that can be used for data security in 
modem cryptosystem. Their technique uses the concept of 
arithmetic ‘coding and can also be clubbed with any of the 
encryption system that works on floating point numbers. 

The rest of the paper is organized as follows: Section 2 
presents a review of Slantlet Transform. We introduce briefly 
the thresholding algorithm applied for embedding in our 
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method in Section3. Section 4 presents the proposed 
algorithms. The experimental results and their analysis is 
presented in Section 5. Conclusions and future scope are 
presented in Section 6. 


2. SLANTLET TRANSFORM 

In image compression, the Wavelet transforms produces much 
less blocking artifacts than the DCT. They are adopted in 
JPEG2000. They also perform well in image de-noising. 
However, 2D wavelet transform is, intrinsically, a tensor- 
product implementation of the 1D wavelet transform, and it 
provides local frequency representation of image regions over a 
range of spatial scales, and it does not represent 2D 
singularities effectively. Therefore it does not work well in 
retaining the directional edges in the image, and it is not 
sufficient in representing the contours not horizontally or 
vertically.An orthogonal discrete wavelet transform with 
approxim ation order two, i.e., with two zero moments and 
improved time localization, known as Slantlet transform (ST), 
was introduced by Ivan W. Selesnick[14] in 1999. It uses a 
special case of a class of bases described by B. Alpert et al. [1], 
the construction of which relies on Gram-Schmidt 
orthogonalization. It is based on a filterbank structure, 
implementing in a parallel form, employing different filters for 
each scale. In DWT, some of these parallel branches employ 
product of basic filters, shown in figure 1. The Slantlet filter 
branches, however, do not employ any product form of 
implementation, as shown in figure 2 and hence ST possesses 
extra degrees of freedom. Ivan W. Selesnick [14] has shown 
that due to this property, ST can be implemented employing 
filters of shorter supports, and yet maintaining the desirable 
characteristics like orhtogonality and an  octave-band 
characteristics, with two zero moments. For k=2, the iterated 
filters of Daubechies are of length 10 and 4 whereas in case of 
SLT they are of length 8 and 4, i.e., 2-scale SLT filterbank has 
a filter length which is two samples less than that of a 2-scale 
iterated D_-filterbank. This difference grows with the number 
of stages. Though SLT has no tree structure like DWT, it can 
be efficiently implemented with same order of complexities as 
of DWT. 

Data compression using 2-scale SLT filterbank involves three 
steps: transformation of input signal using the SLT, 
thresholding of transformed coefficients and reconstruction of 
the signal from the thresholded coefficients. G. Panda et al [13] 
have shown that SLT provides improves time localization than 
the DCT and DWT. 
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(b) 
Figure 1: (a) Two-scale iterated filterbank using the DWT 


(b) Two-scale filterbank structure using the SLT. 


Also considering various compression parameters such the 
percentage of energy retained and the MSE of different PQ 
signals, it is observed by them that the accuracy of the 
reconstruction of SLT method is better than that the DCT and 
DWT, i.e., the SLT based compression technique yields better 
performance compared to both the DCT and DWT. In the 
compression scheme using SLT, the data is first applied to two- 
level filter structures H(z), H;(z), H2(z), and H;(z). The output 
of these filters are down sampled by a factor of 4, which are the 
transform coefficients of the input data obtained after the 
convolution operation of the original data with the filter 
coefficients , as shown in figure 1. The transform coefficients 
are then thresholded using a suitable parameter. The inverse 
slantlet transform are performed on these thresholded 
coefficients to reconstruct the original data. 

The figure 2(a) is the 1-level decomposition obtained after 
applying 1-d slantlet filters to image ‘Tulips.jpg’ and 
decomposing into low (L) and high sub-bands(H). The figure 
2(b) shows the 2-level decomposition of image lena. bmp when 
the 1-D slantlet filters are used first on the rows of image and 
then on the columns, resulting into sub-bands HH, HL, LH and 
LL respectively. 
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Figure 2. a) 1-level Slantiet image of “Tulips.jpg”, and 
b) 2-level Slantlet Image of ‘lena.bmp’ 
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Nagaraj B Patil et al [11] have shown that as threshold level 
increased better compression ratio and PSNR can be achieved 
for the test data. It has been observed that most of the middle 
and high frequency coefficients in the HL, LH or, HH subbands 
obtained from SLT are of low magnitudes. As these bands 
constitute 75% of all SLT coefficients, the highest payload can 
be 0.75 bit per pixel (bpp). Table 1 lists the payload of four 
different images (256x256) under different thresholds 
“alternately” (unless you really mean something that alternates). 
For ‘Flower.jpg’, if threshold T is set to be 8, the payload is 
0.645 bpp. It shows that over 86% coefficients in the high 
frequency subbands are used for data hiding in the Threshold 


embedding technique. 


rose [05 


Table 1: Threshold vs payload 





3. THRESHOLDING METHOD 

Threshold embedding method for the lossless data hiding is 
given by Xuan et al. [21]. We predefine a threshold value. To 
embed data into a high frequency coefficient of sub-band HH, 
LH or HL, the absolute value of the coefficient is compared 
with T. If the absolute value is less than the threshold, the 
coefficient is doubles and message bit is added to the LSB. No 
message bit is embedded, however, the coefficients are 


modified as follows: 
2*x+b if [Ixi<T 
x= x+T ifx > T 


x—(T-1) ifx <-T 


where T is the threshold value, b is the message bit, x is the 
high frequency coefficient and x’ is the corresponding modified 
frequency coefficients. 

To recover the original image, each high frequency coefficient 
can be restored to its original value by applying the following 


formula: 
E if -2T <x’ <2T 
x= x‘-T if x >2T 
x’ +T-1  ifx’ <-2T +I 


The Figure provides an example to hide the message, s=101010 
into a block of 3x3 where T=10. 
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T=10; s=101010 
Extraction 





4. PROPOSED ALGORITHM 

The proposed reversibhle image steganography algorithm 
embeds data into the first level high frequency subbands of the 
cover image. Preprocessing is performed prior to data 
embedding to ensure that no overflow/Aimderflow takes place. 
The stego-image carrying hidden message is obtained after 
taking the inverse contourlet transform. Fig. 3 is the flowchart 
of the proposed embedding data hiding ande Figure 4 is the 
flowchart for hidden data extraction and original cover image 
recovery. 


The embedding algorithm is summarized as follows: 

Algo: Embedding 

Step1. First obtain the secret data by applying best T-codes as a 
source encoder to the given input text/message. 


Step 2. Modified AES encryption algorithm [25] is applied on 
the compressed data. 

Step3. Apply pre-processing to prevent possible “overflow” 
during embedding (e.g., replacing the grayscale values 0 to 255 
into 15 to 240). 

Step4. Consider 8-bit greyscale image and decompose it into 4 
sub-bands : one lowpass sub-band and 3 sub-bands for 
horizontal and vertical directions by applying 2-level SLT, 
viz., HL,LH and HH 

Step5. Embed data in the high horizontal and vertical sub- 
bands of SLT using thresholding method (taking threshold 
value=35). 

Step6. Obtain the stego-image by taking the inverse SLT of the 
modified image of stepS. 


Stee ete ee seat ese eee eee ee eee ese eee eee eee ete eee 


Step 1. Apply CTT to the stego image 

Step 2. Extract secret data from the four horizontal and vertical 
gubands of CTT inverse thresholding technique. 

Step 3. Improved AES decryption algorithm[21] is applied on 
the extracted codes to obtain the actual encoded T-codes. 

Step 4. Obtain the original message by T-decoding the secret 
data, with the help of encoding key 

Step 5: Recover the original image by removing the hidden 
message from the stego-image 


PSS SFA AEH SASHERTASEHEHEHHSSHTHT HTS SSH THERM THRARE HERES gage 
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Table 2: PSNR values based on Wavelet and SLT using 
Huffman encoding (secret message = 5000 bits) 





Modified AES 
Encryptio 







| mias 
Figure 3: Block diagram of Embedding method 


5, EXPERIMENTAL RESULTS AND ANALYSIS 

To evaluate the performance of the proposed data hiding 
algorithm, we have used 128 x128 and 256 x256 gray scale 
images. Simulations are done using MATLAB 8.0. We have 
compared the performance of the proposed steganographic 
method based on SLT using T-codes as endcoder, improved 
AES as encryption and reversible thresholding technique as 
embedding with the corresponding steganograpic method based 
on Wavelet. We have tested number of images such as standard 
images and medical images. We have used the metric PSNR 


for measuring the stego-image quality. 

lmperceptibility 

The perceptibility measure for the quality of image used is 
PSNR given by 
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PSNR = 10 log10 (2552/ MSE) 


MSEX{1/N) 2 FF (xij - x’ij ) ? 

where x denotes the original pixel value 

Table 2 shows the test results for these methods using only 
Huffman codes as encoder, Table 3 shows test results using 
only T-codes as encoder, Table 4 shows the results using 
Huffman codes and improved AES encryption, and Table 5 
shows the results using T-codes and modified AES encryption. 
We have shown the results for the four images: 
H:Cameraman tif, P: Lenajpg, 13: Naturejpg, and 14: 
Scenery.jpg (see Figure 9). 








Encoding 
wik barr T-cade 


Original message 
Onginal image 
Figure 4: Block diagram of Extraction method 


SLT+HUFF 
(adding 
Gaussian) 





Table 3: PSNR values based on Wavelet and SLT using T- 
code encoding (secret message = 5000 bits) 


B 


22 951326 
Table 4: PSNR values based on Wavelet and SLT using 


Huffman encoding and AES encryption (secret message = 
5000 bits) 
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Robustness 

The figures 5 to 8, we show the bar diagrams for comparison of 
PSNR values for four images using the proposed algorithm 
based on slantlet transform, T-codes andf AES method with or 
without the additon of Gaussian noise (0.01) and compared 
with the corresponding algorithm for wavelet transform. 


Analysis 
The results of the PSNR of the proposed method based on SLT 
is compared with the Wavelet transform and Slantlet transform 
and are summarized in the table 2 to table 5. 
The imperceptibility is found to be better in the SLT based 
reversible thresholding algorithm than DWT based reversible 
thresholding method. 

(adding (adding 


IMA | WLT+TC 
GE | ODE 
+AES 
Gaussian Gaussian 


SLT+TC 
ODE 
+AES 


WLT 
+TCODE 
+AES 


SLT+TC 
ODE 
+AES 


9.738723 


Table 5: PSNR values based on Wavelet and SLT using T- 
codes encoding and AES encryption (secret message = 5000 
bits) 


The algorithm does not need original image for recovering the 
secret data (It is a blind data hiding scheme). The use of T- 
codes provides self-synchronization in the decoding stage. 
From the above tables it can be seen that SLT along with 
Huffman compression technique and AES encryption method 
has slightly better PSNR values than SLT along with T-codes 
and AES method. 

Further, SLT based steganographic method is robust to 
Gaussian effect ( same results have been observed for salt and 


pepper). 





_— re er es -— 


Figure 5: (1) WLT+Huff, (2) WLT+Huff+Gaussian (0.01), 
(3) SLT+Hoff, (4) SLT+HuffGaussian 
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Figure 6: (1) WLT+Tcode, (2) WLT+ Tcode +Gaussian 
(0.01), (3) SLT+Tcode , (4) SLT+Tcode +Gaussian 


6. CONCLUSION AND FUTURE SCOPE 

In this paper we have presented 

l. a new variable length codes, viz. T-codes for the 
compression of embedding message. 

2. An improved AES for the encryption of the encoded 
message 

3. SLT in place of DWT as they provide better perceptibility 
and compression. 

4. The reversible thresholding technique so that one can 
recover the original image from the stego-image. 

Slantlet transform, which is also a wavelet-like transform and a 

better candidate for signal compression compared to the DWT 

based scheme and which can provide better time localization, 

Hufiman codes have been preferred for data compression by 

researchers. However, people have been searching for self 

synchronizing variable length codes since 1970. One of the best 

self-synchcronization variable length codes which can replace 

Huffman code is T-code [18-19].We have applied these codes 

for data compression in the proposed algorithm. 

The T-codes are self-synchronizing codes shown to be better 

than Huffman codes tn the decoding process. They also provide 

a layer of security in the system as one needs encoding key to 

encode the secret message obtained from the extraction 

process. 

The use of encryption in steganography can lead to ‘security in 

depth’. To protect the confidential data from unauthorized 

access, an advanced encryption standard (AES) has been 

suggested by the researchers [5]. AES algorithm is a very 

secure technique for cryptography and the techniques which 

use frequency domain are considered highly secured for system 

for the combination of steganography. 

The reversible threshold embedding technique is used for 

embedding the secret message in the sub-bands of transform 

image obtained from the cover object by applying 2-level of 

SLT and results are compared with the data hiding techniques 

based on wavelet (biorthogonal cdf9/7) transform. 
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Security 

The integration of Compression technique (T-codes) and 
cryptography technique (Modified AES) with Steganography 
use three keys — encoding key, encrypted key and threshold 
value, making the present algorithm a highly secured method. 


Robustness 

The proposed method provides not only acceptable image 
quality but also has almost no distortion in the stego-image 
after adding Gaussian noise or Salt and Pepper noise. The use 
of SLT has shown better results than DWT in terms of image 
metric ‘PSNR’ and robustness. 


Recovery 

There is no artifact obtained in the stego-image and the original 
image is recovered with low image degradation from the stego- 
image. 


Embedding Payload 
The embedded payload in the proposed embedding eau” is 
same as in case of the DWT techniques. 
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Figure 9: Cover images I1, 12, B and I4 





Figure 7: (1) WLT+AES+Haff, (2) WLT+AES+ 
Huff+Gaussian (0.01), (3) SLT+AES+Huff , (4) 
SLT+AES+Huff+Gaussian 
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Figure 8: (1) WLT+AES+Tcode, (2) WLT+AES+ Teode 
+Gaussian (0.01), (3) SLT+AES+ Teode , (4) SLT+AES+ 
Tcode +Gaussian 
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Abstract - Cube based networks have received much attention 
over the past decade since they offer a rich interconnected 
structure with a number of desirable properties such as low 
diameter, high bisection width, lesser complexity and Cost. 
Among them the hypercube architecture is widely used 
network for parallel computer system due to its low diameter. 
The major drawback of hypercube based architectures is the 
difficulty of its VLSI layout. Several variations of hypercube 
have also been reported which are designed by considering a 
‘specific topological property. Nevertheless, no particular 
topology claims to have better performance with all the 
desirable topological properties. In this paper the 
performance analysis of various interconnection networks is 
presented, The performance is compared by considering cube 
type architectures as well as linear type architectures on 
different parameters such as degree, diameter, bisection 
width, scalability and cost etc. The Analysis indicates that 
cube based architectures have a rich interconnected structure 
with high cost and complexity. On the other hand linear type 
architectures are scalable, simpler and better in terms of cost 


and complexity. The comparative study suggests the various 
aspects to the design of new multiprocessor architectures. 


Index Terms - Interconnection network, Performance 


evaluation, Topological properties, Parallel systern, Cube 
Architectures 


1. INTRODUCTION 

systems interconnection networks play an important role in the 
overall performance of the system. Deciding the appropriate 
network is an important issue in the design of parallel and 
distributed systems. In general, determining the optimal 
network to implement any parallel application does not have a 
known theoretical solution. There are different ways to 
determine efficient topologies that trade-off high level 
performance issues against various implementation constraints 
[1]. A Topology is evaluated in terms of a number of 
performance parameters such as degree, diameter, bisection 
width and cost. Several researchers have developed various 
architectures which are considered better in terms of particular 
parameters. 
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Some variations focus on the reduction of the diameter [10] 
[18], some of them focused on the design of simple routing and 
communication algorithm [4]. Scalability is also an important 
issue to evaluate the performance of interconnection networks. 
However, it can’t be clearly mentioned that which 
interconnection network is working better by considering all 
the parameters. In terms of complexity interconnection 
networks may be classified into two major categories. The first 
is cube based architectures which posses a rich interconnection 
topology. The Binary hypercube or n-cube has been widely 
used interconnection network in the design of parallel systems 
[12]. Several variations of hypercube architecture are reported 
in the literatures some examples are —folded hypercube (FDC), 
metacube (MC), folded metacube (FMC) and folded dualcube 
(FDC) etc. [8] [7] [12] [13] [15] [11] . The major drawback in 
such networks is the increase in the number of communication 
links for each node and the increase in the total number of 
nodes in the system which ultimately enhances the complexity 
of such interconnection networks [19] [20]. Therefore, there is 
a need to carry out the performance analysis of various 
interconnection networks by considering their topological 
properties. 


The second class of the network is linearly extensible networks 
such as linear array, ring, linearly extensible tree and linearly 
extensible cube etc [10] [16]. The complexity of these networks 
is lesser as they do not have exponential expansion. Besides the 
scalability, other parameters to evaluate the performance of 


-such networks are degree, number of nodes, diameter, bisection 


width and fault tolerance. The main purpose of this paper is to 
study and analyse the various multiprocessor networks along 
with their properties to help in the design of a new 
interconnection architecture. Selection of a better 
interconnection network may have several applications with 
lesser complexities and improved power-efficiency. One such 
modern application is network on chip (NoC) paradigm where 
different cores are embedded with appropriate connectivity. 
Some examples may include mesh, torus, star, etc. [1] [9]. 


In this paper, the study of five cubes based architectures as well 
as several linear extensible architectures are carried out. 
Section 2 describes, the various parameters used to make the 
performance analysis. Various parameters used to compare the 
performance of cube based architectures and their characteristic 
is discussed in section 3. Similarly, the comparative analysis of 
dinearly extensible architectures is carried out in section 4. A 
comparative study of both the type of architectures is made in 
section 5 and finally concluded the paper in section 6. 
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2. PERFORMANCE PARAMETERS 

The need for architectural performance evaluation exists from 
design phase to its installation. The various parameters decide 
the design alternatives and gives a criterion of selection known 
as cost performance trade off [6] [3]. In general, the 
performance of various architectures is measured by the 
following parameters. 

A. Degree (d) 

It is connectivity among different nodes in a network. The 
connectivity of the nodes determines the complexity of the 
network. The greater number of links in the network means 
greater is the complexity. 

B. Diameter (D) 

It is defined as the maximum shortest path the source 
and destination node. The path length is measured by the 
number of links traversed. This virtue is important in 
determining the distance involved in communication and hence 
the performance of parallel systems. 

C. Bisection width (B) 

The bisection width of a network is the minimum number of 
edges whose removal will result in two distinct sub networks. 
Greater bisection width is better for a network to be fault 
tolerant. 

D. Cost (C) 

It is defined as the product of the diameter and the degree of the 
node for the asymmetric network. ( i.e. Cost = D*d). This 
factor is widely used in performance evaluation. 

E. Extensibility 

This is the virtue which facilitates large sized system out of 
small ones with minimum changes in the configuration of the 
nodes. It is the smallest increment by which the system can be 
expanded in a useful way. 


In the Present work the above parameters are compared for 
different types of multiprocessor architectures. The values are 
computed based on a certain mathematical formula designed 


for specific topology. 


3. CUBE BASED ARCHITECTURES 

A. Hypercube 

The Binary hypercube or n-cube has been one of the most 
popular interconnection networks having logarithm diameter 
[12]. Each node in this network is connected through 
bidirectional asynchronous point-to-point communication link 
to other nodes. The major drawback of the hypercube is the 
increase in the number of communication links for each node 
with the increase in the total number of nodes in the 
system[17]. The hypercube has a high bisection width b=2™' 
and has good capability of fault tolerance. 


B. Folded Hypercube 

The folded hypercube (FHC) is a standard hypercube with 
some extra links established between its nodes [2]. A folded 
hypercube of dimension n is called FHC (n). The FHC (n) is 
constructed from a standard hypercube by connecting each 
node to the unique node that is farthest from it. The FHC (n) is 
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a regular network of node connectivity (n+1) and the hypercube 
of degree 3 is converted to FHC network as show in Figure 1.7 
The diameter of an FHC (n) is (n/2) and bisection width is 2°/4. 





2 
Figure 1: Folded Hypercube FHC (3) 

C. Metacube 

The metacube (MC) is an interconnection network for a very 
large parallel computer. In this network, the number of nodes is 
much larger than hypercube with a small number of links per 
node [4] [14]. The metacube network shares many desired 
virtues of the hypercube such as small diameter. The metacube 
(MC) network includes the dual-cube as a special case. The 
MC network has two level cube structure a high-level cube 
(classes) represented by the k- dimension and low- level cube 
(cluster) represented by m-dimension. An MC (k, m) network 
can connect 2 nodes with (k+m) links per node. The degree 
ig (n-kV2"+k and the bisection width of an MC (k, m) is 

pa 


D. Folded Metacube l 
The folded metacube is an interconnection topology which 
inherits some of the useful properties of the metacube and 
folded hypercube (FHC) [5]. The folded metacube is graph G 
(V, E) as show in Figure 2. Where V represents a set of vertices 
and E represents a set of links. The graph is a modified of 
metacube. The diameter of folded metacube is 2(m+k}-] and 
the Bisection width of G is 277/2 + 2™***? 
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* Figure 2: Folded metcube FMC (3) 
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E. Folded Dualcube 

The Folded dualoube (FDC) is a cube based topology which 
inherits some of the useful properties of the dualcube [8] and 
the folded hypercube (FHC) [2]. The folded dualcube, which is 
constructed by connecting each node farthest from it and is 
show in Figure 3. 
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Figure 3: Folded dualcube FDC (3) 
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The nodes connectivity of folded dualcube is (n+3YV2, the 
diameter is n-1 and having bisection width is 2°/2 [5]. The 
Various parameters of cube based architectures along with the 
topological properties are summarized in Table 1. 
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| Table 1: Various parameters of Cube based Architectures 


Exponential 


| 4, LINEAR EXTENSIBLE ARCHITECTURES 
A. Linear Array 
It is one dimensional network having the simplest topology 
with n-nodes having N-1 communication links. The internal 
nodes have degree 2 and the termination nodes have degreel. 
The diameter is N-1, which is long for large N and the bisection 
width is 1. It is asymmetric network. 


B. Binary tree 

The binary tree is scable architecture with a constant node 
degree and constant bisection width. In general, an n-level, 
complexity balanced binary tree should have N=2"-1 nodes. 
The maximum node degree is 3 and the diameter is 2(n-1). 


C. Linearly Extensible Tree 
A binary type network topology has been reported [10] shown 
in Figure 4. The Linearly Extensible Tree (LET) architecture 
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exhibits better connectivity, lesser number of nodes over cube 
based networks. The LET network has low diameter, hence 
reduce the average path-length traveled, by all message and 
contains a constant degree per node. The LET network grows 
linearly in a binary tree like shape. In a binary tree the number 
of nodes at level n is 2n whereas in LET network the number is 


(m+1). 


Figure 4. Linearty Extensible Tree (LET) network 


D. Linearly Extensible Cube 

The Linearly Extensible Cube (LEC) network grows linearly 
and posses some of the desirable topological properties such as 
small diameter [10], high connecting constant node degree with 
high scalability. It has a constant expansion of only two 
processors at each level of the extension while preserving all 
the desirable topological properties. The LEC network can 
maintain a constant node degree regardless of the increase in 
size (i.e. number of nodes) in a network. 

The number of nodes in LEC network is 2*n for n>0 where the 
number of nodes in the hypercube is 2°. The diameter of 
network is LJ. It has a constant node degree 4. The LEC has 
a bisection width equal to N, as show in Figure 5, 





Figure 5: Linearly Extensible Cube (LEC) network 


E. Ring 

A ring is obtained by connecting the two terminal nodes of a 
linear array with one extra link. A ring network can be uni-or 
bidirectional and it is symmetric with a constant. It has a 
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constant node degree of d=2, the diameter is V- for a 
bidirectional ring and N for unidirectional ring. A ring network 
has a constant width 2. The different performance parameters 
of Linearly Extensible Architectures are summarized in Table2. 
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Table 2: Various parameters of Linearly Extensible 
Architectures. 


5, COMPRATIVE STUDY 
ARCHITECTURE 

For multiprocessor network parameters such as diameter, 
degree, bisection width, cost regularity and symmetry are 
crucial and determine the performance of the network to 
compare the performance. We proceed to consider the three 
important parameters namely, number of processors, diameter 
and cost. The curves are plotted for each of the parameters for 
both the class of interconnection networks. Figure 6 shows the 
trained of increasing number of processors for each level of the 
extension, It is observed that all the linearly extensible 
architectures except binary tree have lesser number of 
processors. Therefore, the complexity of linearly extensible 
architectures is lesser, when they are expanded on higher level. 
Having lesser number of processors to implement a parallel 
algorithm is always economical. On the other hand the cube 
based architectures have exponential expansions which make 
the network highly complex. The Figure 6 also shows that 
among linearly extensible architectures, the LEC network 
produces better results. 


OF VARIOUS 
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Number of Processors of various networks (Linearty 
Extensible Architecture) 
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-‘Figare 6: Performance of level extensible architectures 
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The second parameter when analyzing the performance of both 
the type of architectures is diameter. To analysis the diameter ' 
of various networks the curves are plotted and show in Figure 7 
and 8. The study of the results in both the curves shows that the 
results in both the types of network are comparable. Among 


cube based architectures, folded hypercube architectures has 
lesser diameter as compare to other cube based architectures 
(Figure 7). When comparing the results of linearly extensible 
architectures the LEC networks has lesser diameter as compare 
to other similar architectures. 
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Figure 7: Performance of Cube based architectures 
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The main parameter in terms of evaluating the performance is 
cost which is defined as the product of the degree and the 
diameter. Figure 9 and 10 depicts the patterns of the cost 
analysis of both the class of networks. In cube based network 
FHC is having lesser cost at greater level as compare to other 
similar cubical architectures (Figure 9). Similarly, when 
comparing the cost of linearly extensible architectures, Figure 9 
shows that LET is having lesser cost in comparing to other 
linear types of architectures. 
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To clearly draw the conclusion, the cost analysis of those 
architectures is carried out which are giving better results in 
their respective categories. Therefore, when comparing the cost 
of FHC and LET, it is observed that LET network has lesser 
cost at higher level as compare to FHC However, the results are 


comparable. 


Cost of various size network (Cube 
Based architectures) 


250 
200 
180 
8 io 


50 





0 
123 45 6 7 8 8 18 


Depth(Level Number) 


Figure 9: Performance of Cube based architectures 


Cost of Various size network (Linearly Extensible 
Architecturee) 





Figure 10: Performance of linearly extensible architectures 


The bisection width is also an important parameter for 
measuring the performance of multiprocessor architectures. 
The bisection width in cube type architectures is of exponential 
value. In case of linearly extensible architectures the bisection 
width is either constant or increases linearly with the increase 
in number of processors. The linear increment is not desirable, 
as such, connection at higher level of architectures do not seem 
to reflect the practical fault tolerance capability of the network. 


6. CONCLUSION AND FUTURE SCOPE 

In this paper the performance of various multiprocessor 
architectures are analyzed by considering their topological 
properties. The comparative study of cube based as well as 
linearly extensible architectures is made. Im cube based 
networks, it is evaluated that the FHC is giving better 
performance in terms of diameter and cost. However, all the 
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cube based architectures have exponential expansion which 
increases the complexity of the system. If we limit the number 
of processors in FHC it can be considered as best 
multiprocessor network with high degree of fault tolerance. 
There is a great scope to modify this network so that it can have 
approximately all the desirable topological properties with 
lesser number of processors. As far as linearly extensible 
architectures are considered they are less complex and easily 
extensible. However, the common drawback is that they are 
having low bisection width, which is not a desirable property to 
make the network fault tolerant. 


The important issue in the design of multiprocessor systems is 
how to cope with the problem of an adequate design of the 
interconnection network in order to achieve the desired” 
performance at low cost. The choice of the interconnection 
network may affect several characteristics of the system such as 
node complexity, scalability and cost etc. The present study is 
carried out on the basis of several characteristics of various 
multiprocessor interconnection networks. There have been 
more work related to design of appropriate multiprocessor 
network; however no one claims a particular design which 
entrenched all the desirable properties. The present study gives 
more scope to design high performance interconnection 
network that can be used in the design of multiprocessor server. 
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Abstract - In this paper, a recent yet powerful technique for 
classification of datasets is presented. The paper contributes 
to highlight the importance of an ensemble approach over 
individual classifiers to achleve better classification accuracy 
of a classifier, In this paper, given dataset is divided into a 
number of parts to constitute an ensemble. The ensemble 
combines these classtiflers. An unknown data pattern is tested 
on the ensemble. Using bagging, majority of voting 
technique, the performance of ensemble is determined on 
different sections of datasets. In the paper, six bench mark 
datasets are used for investigation. Each dataset is trained 
with 80%, 60% and 50% of the data patterns for 
classification. The number of classifiers in an ensemble for 
each data set is changed to 5,7 and 9. As a typical case, k- 
nearest neighbor (k-NN) classifiers are used with the values 
of k varying to 1,3 and 5. The classification accuracies of 
individual classifiers and those of ensembles are computed at 
each case. After extensive experiments of proposed scheme, 
by taking random shuffling and selection of data patterns for 
training and testing, it is observed that in every case, the 
classification accuracy obtained by ensemble is higher than 
that obtained by individual classifier. 


Index Terms - Classification, Ensemble of classifies, 
bagging, k-nn classifier. 


1. INTRODUCTION 

There have been a significant number of research activities in 
the area of data analysis. The size of database keeps on 
increasing with useful or redundant data. The task of analysis 
of the data becomes complex due to presence of these 
redundant, mostly unwanted pieces of data, commonly called 
features in a formatted dataset. The role of a classifier is to 
divide a dataset on the basis of labels or classes of its patterns. 
In addition to classifying data patterns into different classes, it 
is also expected from a classifier to predict the label (or more 
often termed as class) of an unknown pattern, called test 
pattern. Classification has become a vital component of the 
study of pattern recognition [1]. Due to the huge amount of 
data piled up every moment on disks, web spaces and other 
storage devices, techniques like data mining [2,3], have 
become quite relevant. Classification is an important step of 
data mining. Classification is one of the core challenging tasks 
[4] in mining [5], pattern recognition [1], bioinformatics [6]. 
The goal of classification [7,8] is to assign a new entity into a 
class from a pre-specified set of classes. 
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A classifier needs to be trained before it can be set ready for 
predicting the class of unknowh patterns. The learning of 
classifier can be made in two manners viz. supervised and 
unsupervised. In case of supervised learning, the class of every 
pattem is known in advance at the time of training. In 
unsupervised learning, class of the training pattern is not 
given. Commonly, the classifications are based on 
classification models (classifiers) that are induced from an 
exemplary set of pre-classified patterns. Alternatively, the 
classification utilizes knowledge that is supplied by an expert 
in the application domain: In a typical supervised leaming 
setting, a set of instances also referred to as a training set is 
given. The labels of the instances in the training set are known 
and the goal is to construct a model in order to label new 
instances. An algorithm which constructs the mode] is called 
inducer and an instance of an inducer for a specific trainmg set 
is called a classifier. There are several well established 
classifiers such as Fisher’s Linear discriminant analysis (LDA) 
[25], naive Bayes classifier{26], support vector machines, 
SVM [27], k-Nearest neighbor [28], Neural Networks [29.], 
fuzzy [30, 40.]. In many examples, idea behind the 
construction of an ensemble is to combine the classifiers after 
a weak or non perfect training of individual classifiers. The 
ensemble so obtained outperforms every individual classier. In 
fact, human being tends to seek several opinions before 
making any important decision. Before buying very costly 
items or taking critical medical decisions, it is a common 
practice to weight the individual opinions, and combine them 
to reach to a final decision [9]. Recently, Mikel Galaretal[10] 
reported that class distribution, i.e., the proportion of instances 
belonging to each class in a data-set, plays a key role in 
classification. Sometimes imbalanced data-sets problem 
occurs when one class, usually the one that refers to the 
concept of interest (positive or minority class), is under- 
represented in the data-set; in other words, the number of 
negative (majority) instances outnumbers the amount of 
positive class instances [11]. The primary benefit of using 
ensemble systems is the reduction of variance and increase in 
confidence of the decision. Due to many random variations in 
a given classifier model (different training data, different 
initialization, etc.), the decision obtained by any given 
classifier may vary substantially from one training trial to 
another—even if the model structure is kept constant. Then, 
combining the outputs of several such classifiers by, for 
example, averaging the output decisions, can reduce the risk of 
an unfortunate selection of a poorly performing classifier. 
Another use of ensemble systems includes splitting large 
datasets into smaller and logical partitions, each used to train a 
separate classifier. This can be more efficient than using a 
single model to describe the entire data. The opposite problem, 
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having too little data, can also be handled using ensemble 
‘systems, and this is where bootstrap-based ideas start 
surfacing: generate multiple classifiers, é¢ach trained on a 
different subset of the data, obtained through bootstrap re- 
sampling. While the history of ensemble systems can be traced 
back to some earlier studies such as [12,13], it is Schapire’s 
1990 paper{14] that is widely recognized as the seminal work 
on ensemble systems. Few more references for data fusions 
and combining classifiers are available in [15-21]. In this 
paper, study of ensemble of clastifiers is presented using 
investigation on different datasets. It is important to submit 
here that there is a quite little scope of comparison of proposed 
scheme with others available in literature. The reason is that in 
each ensemble of classifiers, the constituent classifiers are 
well established classifiers, viz. neural networks, fuzzy, knn 
otc. The ‘performances of these individual classifiers have 
already béen widely reported in literature in several 
plications. For a simple implementation of the proposed 
k-NN classifier has been used in this paper as the 
constituent classifier of the ensemble. Presumably, the k- 
nearest neighbor algbrithm[28] is considered one of the 
simplest machine learning algorithms. It is further to add that 
the objective here is not to discuss the strength of k-nn but to 
investigate the performance of the ensemble, irrespective of its 
consitiuent classifiers. However a good survey on k-nn 
classifier can be found at [31]. 
The objective of this paper is to support the creation of an 
ensemble with one or more of these classifiers as constituent 
members and to show that under an ensemble, the classifier 
accuracy produced by such an ensemble using majority of 
voting criterion, is always higher than that obtained by using 
individual classifier. This is supported by investigation on six 
benchmark datasets. 
The paper is organized as follows: Section II presents 
proposed ensemble scheme, Section IMI outlines summary of 
datasets used in the experiments. The details of experiments 
and results are discussed in Section IV. Section V addresses 
the strength and weakness of proposed technique by 
comparing it with few of the others reported. Conclusions and 
future research prospects are reflected in Section VI followed 
by references, 


2, PROPOSED ENSEMBLE ALGORITHM 
In this paper, simple bagging without replacement of samples, 
with majority of voting [11,22,23] is used for the investigation 
of proposed scheme. Steps of the algorithm used in the paper 
are given below. 
Algorithm 
Input: D, the given dataset consisting of N patterns. F, number 
of features in each pattern, each pattern being labeled with a 
class c and C is the total number of classes in D. S, is number 
of classifiers in the ensemble. 

1. Partition the entire dataset D into two parts, training 


dataset, S, and testing dataset, S, . Each part has same 
number of features. Each pattern in these two parts is 
labeled with one class out of C classes, thus 


Str (Jsu =D 


2. Make equal partitions of Sẹ such that all parts except 
the last, will have S,/S patterns. The last part will have 
(SS+ S_96S), where % is modulus operation on 
integers. The ensemble will thus have S$ number of 
Classifiers, one for each part. 

3. Invoke k-nearest neighbor classifier [32] with k =1. 

4. Determine the classification accuracy, C, of each part of 
the training data using k-nn, against the same test data 
set Sw- Find out the average C, of all S classifiers. 
Determine the maximum C, obtained in the S 
classifiers. 

5. Shuffle dataset D, create new S, and Sy.. 

6. Iterate steps 2 to 5, J times. Find the C, and maximum 
C,in these / iterations. . 

7. Take every pattern of Sẹ and pass it through all S 
classifiers using begging [11,22] and majority of voting 
techniques to determine it’s class. Repeat the process 
for I times, Calculate average and maximum C; of the 
ensemble (EC,) in these / trials. 

8. Change the value of k (1,3,5) 

9, Change the value of S (5,7,9). 

10.Change the size of training data (80%,60% and 50%) 
and accordingly test data. 


Order of algorithm: There has been a variety of work in 
analysis of k nearest neighbors[34,35]. In the simplest form as 
used here [1,33], for k-nn, the order of search is O(kdt,t, 
where F is number of features (dimensions) in each pattern, k 
is number of nearest neighbors, Euclidean distange is used as a 
metric of nearest hood between test point ¢, and training 
pattern ¢,, P is the preprocessing due to shuffling and 
partitioning of training (and testing) datasets, talking majority 
decision in bag of S classifiers For complete algorithm 
proposed here, order of algorithm may be given as follows 
O(KF tt, + P) 
The algorithm is iterated for k as 1,3,5; S as 5,7,9; and size of 
training dataset as 80%,60% and 50%. 
Fig. 1 shows the proposed scheme. In this figure, as a typical 
example, five classifiers are placed in an ensemble. The parts 
of training data S;...S; are used for creating five classifiers 
C,...Cs, one classifier for one part respectively. The CA of 
ensemble is shown by C,. 


3. SUMMARY OF DATABASES 

Table lsummarizes data sets used for the experiments. The 
data sets are well established and have been used in several 
investigations. The details of each data set can be viewed in 
UCI Machine Learning Repository [24]. There has been no 
preference to choose any particular data set for investigation in 


this paper. 
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Table 1:Description of the Data Set Used. 


4. EXPERIMENTS AND RESULTS 

Proposed ensemble algorithm was run: on an i5 machine using 
MATLAB software. The purpose of the investigation was to 
focus the strength of proposed ensemble scheme over 
individual classifiers. The results obtained for the six classical 
databases are shown in Tables 2(A-F) for Iris, Wine, Bupa 
Liver, Thyroid, WBC( Wisconsin Breast Cancer) and Sonar 
datasets respectively. In each of these tables, first column: T, 
(training data size) indicates the part (in percent) of the 
database which will be used for training only whereas the 
remaining part (100 — T) will be used for testing. Three sizes 
for training have been used in the paper viz. 80%, 60% and 
50%, to reflect attitude of the proposed algorithm towards 
different parts of the data. The next column represents values 
of ‘k’, i.e. the k-th nearest neighbor from the testing data 
pattern. The measure of the distance is taken as Euclidean 
distance. Three values of ‘k’ (1,3 and 5), have been used for 
all these datasets. To apply bagging, each training dataset is 
divided into S number of sub sets. In the paper, S is set for 
three values: 5,7 and 9. In other words, number of classifiers 
in ensemble will be 5, 7 and 9 for each of the datasets. Thus 
for each dataset, a training part of the dataset (80/60/50 %), 
_ has S different subsets. For a typical training dataset with five 
folds or subsets 


Us UsUsUs =s» | 
<a 


shes: Sas, tana eag Bad tastings Da 
respectively, and D is the entire dataset. As a typical case, first 
experiment is conducted with S=5, k=1 and training data size 
=80% of the total dataset. The testing data (20%) will remain 
as the unseen part of dataset. In this case, each of these five 
classifiers, S,...S;is applied to its respective training data part, 
e.g. first classifier accuracy CA will be obtained using 1-NN 
between S,and test dataset, second CA between Szsame test 
data set and so on. The mean (average)of these five C, is 
computed. The C, of ensemble of classifier is computed as 
follows. Take first test pattern from test database and find its 
class using first nearest neighbor (1-NN) with S, then find its 
class with 53,5,5,5;5. The majority (mode) of values of 
classes so obtained in five tests will be the class accepted for 


the ensemble. Repeat the exercise for all the patterns in test 
dataset and calculate its percentage Ce This will compute £ C, 
of the ensemble. Time in execution of the whole process is 
also recorded. The whole exercise is repeated for five times by 
shuffling randomly the dataset. Compute the mean 
classification accuracy Mean Ca, from these five iterations. 
Also calculate the maximum value of Ca, Max C, in these five 
iterations. Similarly compute mean and maximum ensemble 
accuracy Mean EC, and MaxEC, in the five iterations. 
Compute the mean time spent on one iteration. The mean 
values are shown in Tables 2(A-F). The maximum values for 
classification accuracies in five iterations are shown within the 
brackets in the same tables. This is shown by the first row of 
the first main sub column of Table with S=5. Similar exercise 
is repeated for S=7 and 9. This completes row 1 of the table. 
The values of k are varied to 3 and 5. Then training data size is 
changed to 60% and 50% and exactly same procedure is 
adopted. Due to space limitations, the values in tables are 
rounded up to two decimal places. The tables 2(A-F) are 
enclosed in Annexure-1. 

On observing these Tables 2(A), it is noted that for iris data 
set, for S=5, k=l, meanCa =94.7 is highest when individual 
classifiers are considered. In this case meanE C, is 96.7.\For 
S=7, k=1, mean C, =90.5 is highest for individuals, wh 
mean Ca "96.6. For S=9, mean C, "93.3, mean C, =l 
Thus it is noted that meanECA is in each case is higher 
meanCA. In most cases, mean C, is same as maximum value 
of Ca- Typically, for 50% training data, S=7, k=1, mean C, 
=85.7 whereas max Ca is 91.6. Similarly meanE C, is 88.5 and 
maxE Ca =92.6. There are few more cases where mean values 


-of CA are smaller than maximum values of C, .Similar trend 


is noted in all tables 2(A-F). 

As another case, Table 2(E) can be quoted which presents 
results on breast cancer (wbc) data. This dataset has 683 
patterns divided into 444 and 239 patterns for class | and 
class2 respectively. Dataset has 9 features(attributes). With 
nine (S=9) 1-NN classifiers, mean C, 96.6 whereas max C, 
=97.9. The mean E C, =98.5 with maximum ECA as 99.3%, a 
better performance, Sonar dataset has 208 patterns divided 
into two classes having 97 and 111 patterns respectively. It has 
60 features in each pattern. By observing Table 2(C ), it is 
noted that mean C, =61.3, with 60% training data and 1-NN, 
using nine classifiers (S=9), whereas mean& C, under similar 
conditions is 71.1. 

It is therefore observed from study of all these tables that the 
values of mean C, are always less than mean E C.. The 
maximum values of C, in few cases are greater than the mean 
values of C, in five iterations. The reason for running 
experiments for five times is just to ensure that the 
performance of the classifiers can be checked under all 
possible patterns combinations in training and test datasets. It 
is again apprehended that each ensemble can contain any set 
of similar or combination of classifiers such as neural 
networks, fuzzy, Bayesian, KNN etc. The contribution of the 
paper is more towards showing the importance of the 
ensemble with majority of voting than to highlight the stréngth 
of constituent classifiers which are undoubtedly proven in 
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literature. That is why the classification accuracies of 
constituent clhssifiers are compared with that of the ensemble 
and not with other constituent classifiers of the ensemble. As a 
typical example KNN is used in all cases. 


S. DISCUSSIONS ON THE COMPARATIVE STUDY 
OF PROPOSED TECHNIQUE 
The proposed technique has been used for different 
applications e.g. in [36] , researchers used ensemble classifier 
for fMRI data analysis. There are various strong merits of the 
proposed scheme including high possibility of getting better 
classification accuracy from an ensemble than an individual 
Classifier; the individual classifiers of the ensemble need not to 
be perfectly trained, mostly these are weak learners, thereby 
reducing the time and efforts of training them; the fact is also 
confirmed when different sizes of the training dataset is taken 
(80%,60% and 50%) still a good accuracy is achieved; there is 
a scope for feature selection and dimensionality reduction of 
the dataset, under different combinations of features, the 
ensemble can be called to predict a reasonable good accuracy. 
Although it is difficult to find a common platform to compere 
the performance of proposed technique with some other used 
in different context, yet few results are being discussed here 
for the purpose 
For iris data, the accuracy obtained in [37] is 94.7 for CBA 
scheme 96.6 for Neural Network system, where as with the 
proposed technique it is 100% for 9 classifiers in the 
ensembles with 80% training data for validation with & as 1. 
For thyroid data [7], the accuracy is 95% with time as 0.913 
seconds. In proposed scheme, the accuracy is 95.4 with S=4, 
k=1, time = 0.50 seconds, 
For wine dataset, accuracy in [7] is although 89% but time 
taken is 1.34 seconds whereas in proposed scheme accuracy is 
81.7 but time is quite less, 0.53 seconds (k=/,S=9, training 
data T = 60%). 
For WBC data, in [38], the classification accuracy is 90% 
with time taken is 48 seconds whereas in the proposed 
technique, the accuracy is 98% (k=/,S™=9, training data T = 
80%) with time = 1.6 seconds. 
For sonar data, the accuracy obtained in [39], is 81% whereas 
the accuracy obtained by proposed technique is approximately 
79% (k=1,S=5, training data T = 80%). 
It is again reminded that the proposed technique focuses on 
the use and importance of an ensemble of classifiers and not of 
an individual classifier. 
One possible inability of the proposed technique is that it does 
not address or attempt to modify the original structure of any 
individual constituent classifiers. If a classifier originally does 
not fit suitable for a particular dataset or on a specific nature 
of data, the ensemble’by no means will be able to improve its 
performance. Moreover: for a large set of data such as micro 
array gene data, the performance of the proposed technique is 
subject to test. 


6. CONCLUSION . 
In this paper, a recent yet important scheme of Classification 
has been presented. A classifier can produce good 


classification accuracy for one dataset, but performs poorer 
when presented with different dataset or even different section 
of the same dataset. If however, multiple classifiers are trained 
for small sections of the databases, and are combined in the 
form of an ensemble, then such an ensemble can produce 
better classification accuracy. To justify it, six bench mark 
datasets, iris, BUPA liver, thyroid, sonar, breast cancer and 
wine have been used for empirical study. The size of the 
training part of each dataset is taken as. 80%,60% and 50%. 
The number of classifiers in the ensemble is taken as 5,7 and 
9. The k-nearest neighbor has been used as classifier with the 
values of k as 1,3 and 5 under each case. Experiments were 
conducted to evaluate the classification accuracies of all six 
datasets. In. order to provide diversity in training and testing 
datasets, the experiments were iterated for five times with 
shuffling of dataset. The mean and the maximum classification 
accuracies of individual classifiers on each sub sets of training 
datasets were computed. The same were computed for 
ensemble of the classifiers using majority of voting. The 
results produced in these two cases, show that the 
classification accuracy of each individual classifier in general 
is lower than that of the classification accuracy obtained by 
their ensemble. Thus it is concluded from these investigations 
that an ensemble is a good approach to determine the class of 
an unseen data pattern. The scheme can be applied to many 
other datasets. Moreover, other classifiers like neural network, 


. fuzzy etc. can be included in the ensemble. This study can also 


be extended to explore the possibility of feature selection or 
dimensionality reduction. 
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Table 2: Results obtained for six datasets used in ensemble of classifiers 
(A) Iris data 
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(D) thyroid data 
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Figurel: “Representation of ensemble algorithm for number of classifiers, S=5S” 
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considered to be a logical system, which is a generalization of multi-valued logic. A very important 
distinguishing feature of Fuzzy logic Is that In Fuzzy logic, everything is, or is allowed to be, a matter of 
degree. Furthermore, the degrees are allowed to be fuzzy. In a broader sense, however, Fuzzy Logic is much 
more than a logical system. In fact, Fuzzy Logic is a precise system of reasoning and computation in which 
the objects of reasoning and computation are classes with unsharp boundaries. What Is not widely 
recognized, within the scientific community and the general public, Is that Fuzzy Logic has become a vast 
enterprise. There are over 280,000 papers in the literature with Fuzzy In title. There are 25 Journals with 
fuzzy in title. There are close to 25,000 Fuzzy-Logic -related patents Issued or applied for In the United 
States and Japan. There Is a long list of applications ranging from digital cameras to fraud detection 
systems. Particularly worthy of note, on one end, Is the Fuzzy Logic subway system in Sendai, a city of over 
1 million in Japan. On the other end, numerically, is Omron’s 120 million fuzzy logic blood pressure meters. 
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Some critics have been saying that Fuzzy Logic Is a passing fad. This assessment of Fuzzy Logic falls to 
recognize that the world we live In Is, in large measure, a world of Fuzzy classes, and that sclence has much 
to gain from shifting Its foundation from classical Aristotelian logic to fuzzy logic. 


It Is on the above note that we wish to bring out a special issue, to celebrate the Golden Jubllee Year of 
Fuzzy Logic, in the year 2014 — the 50" year of the Introduction of Fuzzy Logic by its Father, Lotfi A. Zadeh, 
in the year 1965. 


Original and unpublished research papers, based on theoretical or experimental works, are solicited for 
publication In the Special Issue of BUIT. Submission of a paper Implies that the work described has not been 
published previously (except in the form of an abstract or academic thesis) and is not under consideration 
for publication elsewhere. Papers can be submitted electronically, after logging in at our portal and 
accessing the submit paper link, available at http://www.bvicam.ac.in/bljtt/SubmitPaper.asp upto 31" 
August, 2013, with “Special Issue of BUIT on Fuzzy Logic” being selected as Publication Type. E-Mallic 
submission will not serve the purpose. Authors wishing to submit the paper to this Special Issue must refer 
to the website, for paper structuring and formatting guldelines in detail, at 


http://www.bvicam.ac.in/bijit/Basic Guidelines for Authors.asp. 


BUIT follows double blind peer review system. All submitted papers are first assessed at editorial board 
level on the basis of their technical suitability, scope of work and plagiarism. The corresponding authors of 
qualifying submissions will be Intimated for their papers to be double blind reviewed by at-least two 
experts on the basis of originality, novelty, clarity, completeness, relevance, significance and research 
contribution. If recommended, the paper may undergo multiple cycles of review, before finally being 
accepted. Final acceptance is based on the review remarks by the referees and decision of the editorlal 
board. Publication of papers In BUIT Is FREE OF COST. We do not charge any publication fee from the 
authors for the papers to be published In BUIT. 


Timeline for Special Issue 


Submission Deadline : 31" August, 2013 
First Notification : 31” October, 2013 
Author Revision Due : 18” November, 2013 


Notification of Acceptance, If Major : 31" December, 2013 
Revision Required ; 
Accepted Papers Due for Editorial : 10” January, 2014 


Review 
Final Acceptance Notification : 31" January, 2014 
Tentative Date of Publication : March, 2014 
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BIJIT - BVICAM's International Journal of Information Technology 


Paper Structure and Formatting Guidelines for Authors 


BLJIT is a peer reviewed refereed bi-annual research journal having ISSN 0973-5658, beihg published since 2009, in both, Hard 
Copy as well as Soft copy. Two issues; January — June and July — December, are published every year. The journal intends to 
disseminate original scientific research and knowledge in the field of, primarily, Computer Science and Information Technology 
and, generally, all interdisciplinary streams of Engineering Sciences. Original and unpublished research papers, based on 
theoretical or experimental works, are published in BIIT. We publish two types of issues; Regular Issues and Theme Based 
Special Issues. Announcement regarding special issues is made from time to time, and once an issue is announced to be a Theme 
Based Special Issue, Regular Issue for that period will not be published. 


Papers for Regular Issues of BIJIT can be submitted, round the year. After the detailed review process, when a paper is finally 
accepted, the decision regarding the issue in which the paper will be published, will be taken by the Editorial Board; ànd the author 
will be intimated accordingly. However, for Theme Based Special Issues, time bound Special Call for Papers will be announced ` 
and the same will be applicable for that specific issue only. 


Submission of a paper implies that the work described has not been published previously (except in the form of an abstract or 
academic thesis) and is not under consideration for publication elsewhere. The submission should be approved by all the authors of 
the peper. If a paper is finally accepted, the authorities, where the work had been carried out, shall be responsible for not 
publishing the work elsewhere in the same form. Paper, once submitted for consideration in BIJIT, cannoi be withdrawn unless 
the same is finally rejected. 


1. Paper Submission 
Authors will be required to submit, MS-Word compatible (.doc, .docx), papers electronically after logging in at our portal and 
accessing the submit paper link, available at http://www. bvicam.ac,in/bijit/SdbmitPaper,asp. Once the paper is uploaded 
successfully, our automated Paper Submission System assigns a Unique Paper ID, acknowledges it on the screen and also 
sends an acknowledgement email to the author at her / his registered emai! ID. Consequent upon this, the authors can check 
the status of their papers at the portal itself, in the Member Area, after login, and can also submit revised paper, based on the 
review remarks, from member area itself. The authors must quote / refer the paper ID in all future correspondences. Kindly 
note that we do not accept E-Mailic submission. To understand the detailed step by step procedure for submitting a paper, 


click at http:/Avww,bvicam.ac.in/BIJTT/guidelines.asp, 


2. Paper Structure and Format 

While preparing and formatting papers, authors must confirm to the under-mentioned MS-Word (.doc, .docx) format:- 

e The total length of the paper, including references and appendices, must not exceed six (06) Letter Size pages. It should 
be typed on one-side with double column, single-line spacing, 10 font size, Times New Roman, in MS Word. 

e The Top Margin should be 1”, Bottom I, Lett 0.6”, and Right 0.6”. Page layout should be portrait with 0.5 Header and 
Footer margins. Select the option for different Headers and Footers for Odd and Even pages and different for First page in 
La (under Page Setup menu option of MS Word). Authors are not supposed to write anything in the footer. 

> The title should appear in single column on the first page in 14 Font size, below which the name of the author(s), in bold, 
should be provided centrally aligned in 12 font size The affiliations of all the aythors and their E-mail IDs should be 
provided in the footer section of the first column, as shown in the template. 

e To avoid unnecessary errors, the authors are strongly advised to use the "spell-check” and "grammar-check" functions of 


the word processor. 
e The complete template has been prepared, which can be used for paper structuring and formatting, and is available at 
http://www, bvicam,ac.in/BL Downloads/Template For Full Paper B pdf. 


e The structure of the paper should be based on the following details:- 


Essential Title Page Information 

¢ Tite: Title should be Concise and informative. Avoid abbreviations and formulae to the extent possible. 

¢ Authors’ Names and Affiliations: Present the authors’ affiliation addreSses (where the actual work was done) in the footer 
section of the first column. Indicate all affiliations with a lower-case superscript letter immediately after the author's name 
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and in front of the appropriate address. Provide the full postal address of each affiliation, including the country name and e- 
mail address of each author. 

* Corresponding Author: Clearly indicate who will handle correspondence at all stages of refereeing and publication. 
Ensure that phone numbers (with country and area code) are provided, in addition to the e-mail address and the complete 
postal address. 


Abstract 

A concise abstract not exceeding 200 words is required. The abstract should state briefly the purpose of the research, the 
principal results and major conclusions. References and non-standard or uncommon abbreviations should be avoided. As a last 
paragraph of the abstract, 05 to 10 Index Terms, in alphabetic order, under the heading Index Terms (Index Terms - ..... ») 
must be provided. 


NOMENCLATURE 

Define ail the abbreviations that are used in the paper and present a list of abbreviations with their definition in Nomenclature 
section. Ensure consistency of abbreviations throughout the article. Do not use any abbreviation in the paper, which has not 
been defined and listed in Nomenclature section. 


Subdivision - numbered sections 

Divide paper into numbered Sections as 1, 2, 3, ...... and its heading should be written in CAPITAL LETTERS, bold faced. 
The subsections should be numbered as 1.1 (then 1.1.1, 1.1.2, ...), 1.2, etc. and its heading should be written in Title Case, 
bold faced and should appear in separate line. The Abstract, Nomenclature, Appendix, Acknowledgement and References will 
not be included in section numbering. In fact, section numbering will start from Introduction and will continue till Conclusion. 
All headings of sections and subsections should be left aligned. 


INTRODUCTION 
State the objectives of the work and provide an adequate backgréund, with a detailed literature survey or a summary of the 
results. 


Theory/Calculation 
A Theory Section should extend, not repeat the information discussed in Introduction. In contrast, a Calculation Section 
represents a practical development from a theoretical basis. 


RESULT , 
Results should be clear and concise. 
DISCUSSION 


This section should explore the importance of the results of the work, not repeat them. A combined Results and Discussion 
section is often appropriate. 


CONCLUSION AND FUTURE SCOPE 
The main conclusions of the study may be presented in a short Conclusion Section. In this section, the author(s) should also 
briefly discuss the limitations of the research and Future Scope for improvement. 


APPENDIX 

If there are multiple appendices, they should be identified as A, B, etc. Formulae and equations in appendices should be given 
separate numbering: Eq. (A.1), Eq. (A.2), etc.; in a subsequent appendix, Eq. (B.1) and so on. Similar nomenclature should be 
followed for tables and figures: Table A.1; Fig. A.1, etc. 


ACKNOWLEDGEMENT 
If desired, authors may provide acknowledgements at the end of the article, before the references. The organizations / - 
individuals who provided help during the research (e.g. providing language help, writing assistance, proof reading the article, 
sponsoring the research, etc.) may be acknowledged here. 
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REFERENCES 

Citation in text 

Please ensure that every reference cited in the text is also present in the reference list (and vice versa). The references in the 
reference list should follow the standard IEEE reference style of the journal and citation of a reference. 


Web references 

AS a minimum, the full URL should be given and the date when the reference was last accessed. Any further information, if 
known (DOI, author names, dates, reference to a source publication, etc.), should also be given. Web references can be listed 
separately (¢.g., after the reference list) under a different heading if desired, or can be included in the reference list, as well 


Reference style 

Text: Indicate references by number(s) in square brackets in line with the text. The actual authors can be referred to, but the 
reference number(s) must always be given. Example: '..... as demonstrated [3,6]. Barnaby and Jones [8] obtained a different 
result zi 
List: Number the references (numbers in square brackets) i in the list, according to the order in which they appear in the text. 
Two sample examples, for writing reference list, are given hereunder:- 


Reference to a journal publication: 
[1] L J. Cox, J. Kilian, T. Leighton, and T. Shamoon, “Secure spread-spectrum watermarking for multimedia”, /EEE 
Transactions on Image Processing, Vol. 6, No. 12, pp. 64 — 69, December 1997. 


Reference to a book: 
[2] J. G. Proakis and D. G. Manolakis — Digital Signal Processing ~ Principles, Algorithms and Applications; Third Edition; 
Prentice Hal! of India, 2003. 


Mathematical Formulae 
Present formulae using Equation editor in the line of normal text. Number consecutively any.equations that have to be referred 
in the text 


Captions and Numbering for Figure and Tables 

Ensure that each figure / table has been numbered and captioned. Supply captions separately, not attached to the figure. A 
caption should comprise a brief title and a description of the illustration. Figures and tables should be numbered separately, 
but consecutively in accordance with their appearance in the text. 


3. Style for Illustrations 
All line drawings, images, photos, figures, etc. will be published in black and white, in Hard Copy of BIJIT. Authors will need 


to ensure that the letters, lines, etc. will remain legible, even after reducing the line drawings, images, photos, figures, etc. to a 
two-column width, as much as 4:1 from the original. However, in Soft Copy of the journal, line drawings, images, photos, 
figures, etc. may be published in colour, if requested. For this, authors will need to submit two types of Camera Ready Copy 
(CRC), after final acceptance of their paper, one for Hard Copy (compatible to black and white printing) and another for Soft 
Copy (compatible to colour printing). 


4. Referees 
Please submit, with the paper, the names, addresses, contact numbers and e-mail addresses of three potential referees. Note 
that the editor has sole right to decide whether or not the suggested reviewers are to be used. 


5. Copy Right 
Copyright of all accepted papers will belong to BITIT and the author(s) must affirm that accepted Papers for publication in 
BIJIT must not be re-published elsewhere without the written consent of the editor. aed with this policy, authors will 
be required to submit a signed copy of Copyright Transfer Form, available at in/bijit/Downlo ISIT- 


Copyright-Agreement.pdf after acceptance of their paper, before the same is published. 
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6. Final Proof of the Paper 
One set of page proofs (as PDF files) will be sent by e-mail to the corresponding author or a link will be provided in the e-mail 
so that the authors can download the files themselves. These PDF proofs can be annotated; for this you need to download 
Adobe Reader version 7 (or higher) available free from http://get.adobe.com/reader. If authors do not wish to use the PDF 
annotations function, they may list the corrections and return them to BIJIT in an e-mail. Please list corrections quoting line 
number. If, for any reason, this is not possible, then mark the corrections and any other comments on a printout of the proof 
and then scan the pages having corrections and e-mail them back, within 05 days. Please use this proof only for checking the 
typesetting, editing, completeness and correctness of the text, tables and figures. Significant changes to the paper that has been 
accepted for publication will not be considered at this stage without prior permission. It is important to ensure that all ` 
corrections are sent back to us in one communication: pleese check carefully before replying, as inclusion of any subsequent . 
corrections cannot be guaranteed. Proofreading is solely authors’ responsibility. Note that BIJIT will proceed with the | 
publication of paper, if no response is received within 05 days. 
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Subscription Order Form 
Please find attached herewith Demand Draft No. dated 
For Rs. drawn on Bank 


in favor of Director, “Bharati Vidyapeeth’s Institute of Computer Applications and 
Management (BVICAM), New Delhi” for a period of 01 Year/ 03 Years 


Subscription Details 
Name and Designation š 
Organization 
Mailing Address 
PIN/ZIP 
Phone (with STD/ISD Code) __- FAX 
E-Mail (in Capital Letters) s 
Date: ; Signature 
Place: (with official seal) 
Filled in Subscription Order Form along with the required Demand Draft should be sent to the 
following address:- 
Prof. M. N. Hoda 


Editor-in- Chief, BIJIT 
Director, Bharati Vidyapeeth’s 
Institute of Computer Applications & Management (BVICAM) 
A-4, Paschim Vihar, Rohtak Road, New Delhi-110063 (INDIA). 
Tel.: 91 — 11 — 25275055 Fax: 91 — 11 — 25255056 E-Mail: bijit@bvicam.ac.in 


Visit us at: www.bvicam.ac,in/bijit 





About Bharati Vidyapeeth: Bharat! Vidyapeeth, which is our parent 
body, was established on 10° May 1964 by Hon’ ble Dr. Patangrao 
Kadam with a wider objective of “Transformation through dynamic 
education”. Under the leadershlp of the Hon’ ble Founder, Bharati 
Vidyapeeth has made astonishing strides In the fleld of education, during 
a short span of 49 years, with a network of more than 188 Institutions all 
over India. Acknowledging the excellence, the Ministry of HRD, Govt. of 
Indla, on the recommendation of UGC, New Delhi, has accorded the 
status of a Deemed to be University to eleven faculties of Bharati 
Vidyapeeth In April 1996. By now, Bharati Vidyapeeth University, Pune, 
has 32 Institutions / Colleges as Its constituent units. Besides maklng 
contribution to the intellectual awakening, Its activibes have been geared 
to bring multidimenslonal progress and welfare of different section of 
population Including women, tribal and rural people. 
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It has state of the art Infrastructural and Instructional 
facilities, comparable to the best In the world. BVICAM 
also contributes effectively towards providing 
excellent opportunities for teaching and research 
activities by organizing several National Seminars, 
Conferences, Symposiums, Faculty Development 
Programmes, Workshops and publication of Research 
Journals, In this sequel, BVICAM feels proud to release 
the ninth Issue of BVICAM’s International Journal of 
Information Technotogy (BDIT). BOIT has been Indexed at 
EBSCO (USA), Cabell’s Directory (USA), DOAJ (Sweden), 
Google Scholar & open }-Gate and many more. 
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About BVICAM: Bharati Vidyapeeth's 
Institute of Computer Applications and 
Management (BVICAM), New Delhi, was 
established by Bharati Vidyapeeth, Pune, In the 
year 2002. BVICAM Is a reputed and most sought 
after Insttute for MCA programme In north India. 
It Is approved by All Indla Council for Technical 
Education, New Delhi, and Is affiliated to Guru 
Gobind Singh Indraprastha University (GGSIPU), 
Kashmere Gate, Delhi. Presently, it runs 03 years 
Master of Computer Applications (MCA) 
Programme. BVICAM Is centrally located at 
National Highway No. 10, Rohtak Road, A-4, 
Paschim Vihar, New Delhi, in its own state of the art 
sprawling campus. 
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AJ correspondences rested to fhe conference 
musi be sent to the eddroes.- 


Prof. ML N. Hoda 
General Char, MDIACon: - 2014 
Director, BVIGAM, A-4, Paschim Vihar, New Delhi -63 (IMOLA) 
Fel: 91-11-25275055, PabeFac 91-11-25255056, 09212022066 (Moba) 
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INDIACom-2014 


8" INDIACom; 2014 International Conference on 


Computing for Sustainable Global Development 
(05"-07" March, 2014) 
IEEE Conference Record Number # 32558 


INDIACom-2014 ts anned to Invite original research papers in the fled of, primarily, Computer Scence and 
Information Technology and, generally, all Interdsscrplinary streams of Engmeenng Sclances, having central 


INDIACom-2014 s an amalgamation of four diferent Intemational conferances which will be organized 
paralel to each other, as parallel tracks. These are ished below: - 
Track #1: Intermnattonal Conference on Sustatnable Computing (ICSC-2014) 
Track #2: Intermatonal Conference on High Performance Computing (ICHPC-2014) 
Track #3: Intemabonal Conference on High Speed Networking & Information Securty 
(ICHNIS-2014) 
Track #4: Intemational Conference on Software Engineering & Emerging Technologies 
(ICSEET-2014) 


INDIACom-2014 will be heid at Bharati Vidyapeeth, New Deihi (INDIA). The conference will provide a 
platform for technical eee within the research community and will encompass haera papa 
presentation 


Full length orignal and unpubkeshed rassarch papers based on theoretical or experimental contributions related to 
the following topics, but not lnmited to, are solicrted for presentation and publication in the conference - 


5 enn e Distributed and Cloud © Data Mining and Business 
Computing irtalgence 
a BEE ¢ Parallel, Mult-core and Gnd + Big Date Analyocs 
“Sa, Energy Efficant Systems Computing e Operating Syxtems 
p LG for Education, Health & e Reconfigurable Architectures e Dets Communication, 
id e Changing Software Computer Networks and’ 
AUT fox Environmental Architectural Paradigms Information Security 
x e Programming Practices & e Wireless 
e for Sustamable Agnculture Coding Standards e Network Montonng Tools 
we © Software Inspection, e Next Generation internet 
> JT for Water Resources Venficaton & Validation e Mobile Computing 
wo Mbnagement e Software Suing and Estimation e Entertamment Technologies 
> Consumers’ Right Techniques ° Multineda Computing 
o NIK for cre reves A -Apie Technoiopes e Information and Cofiaboration 
= Sake Recovery Artftoel Intelhgance and Systems 
e 7 for Dasar Management Neural Networks + Fuzzy and Soft Computing 
and Remote Sensing e Computer Ymon, Graphics, e Bsomformatics 
» [T for other day to day and image Processing . Medai informatics 
problems e Modeling and Slmulaton © Educaton Informatics 
© E-Governance s Embedded Systems and s Computabonal Fiance 
Knowledge Managem ent Robotics e Research Methods for e 
E-Commerce, ERP, CRM & * Human Computer interachon Computing 
Knowledge Mari ng e Databases e Case Studies & Applicatons 
© Technology for Convergence 
Paper Submission 


Authors from across differant perts of the world are invrted to submit thelr papers. Authors wishing to 
sly ete ap lig E aia Pte eta pea erable a a at 

» É LASD, Authors shouki submit them papers 
: asp, Unregestered authors should first 
create an account on ip: 1 ambanam to log on and submit paper. Onty 
Jaron anai be onada. Ee ERa Wal not ba a. 


Review Process, Publication and Indexing 


The conferenca alms at carrying out two rounds of review process. In the first round, the papers submitted 
by the authors will be assessed on the basis of their techmeca! stabilty, scope of work and plagnartan. Thee 
corresponding authors of qualifying submammons will be momated for ther papers to be double bind 






proceedings, 
serials. Conference proceedings will also be available In the form of CD-ROMs. Al accepted papers, which wil 
be presented In the conference, will be submitted for incusion to IEEE Xplore, as 3 part of TEEE's 
Conference Publication Programme, subject to thar terms and conditons. Further debarts are available 
at werw. Dbvicem aain/indiacom, 


Important Dates 


Submission of Full Langth 04" Novembar, 2013 Paper Acceptance 15" january, 2014 
Paper . Notification 

Submission of Camera Reedy 20% Jarruary, 2014 Registration Deadline (for 31" January, 2014 
Copy (CRC) of the Paper inclunon of Paper in 
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